Galileo emerged from stealth with the first machine learning (ML) data intelligence platform for unstructured data that gives data scientists the ability to inspect, discover and fix critical ML data errors 10x faster across the entire ML lifecycle – from pre-training to post-training to post-production. The platform is currently in private beta with the Fortune 500 and startups across multiple industries.
“There are many MLOps platforms available on the market today, each fully capable of orchestrating the model lifecycle,” said Bradley Shimmin, Chief Analyst of AI Platforms, Analytics and Data Management. “However, when it comes to addressing the complex problem of inspecting and fixing the data — especially for unstructured data — many platforms still presume that enterprise practitioners work with data they already know and trust across the ML lifecycle. This couldn’t be further from the truth and is one of the biggest bottlenecks for ML adoption today. What they need are tools that elevate the importance of data from the outset, putting data with a capital ‘D’ back into Data Science. Galileo is tackling this critical need head on.”
More than 80% of the world’s data today is unstructured (text, image, speech, etc.) and historically has been vastly untapped for ML. Recent advancements have made it easy for any data scientist to plug and play complex models for unstructured data, leading to a surge in their adoption across industries.
It is common for data scientists to use spreadsheets and Python scripts to inspect and fix their training unstructured data. Doing this ‘data detective work’ consumes more than 50% of a data scientist’s time, is ad-hoc, manual, error prone and leads to poor data transparency across the organization, causing avoidable mispredictions and biases in production models.
Galileo takes a unique approach to this problem – with just a few lines of code added by the data scientist while training a model, Galileo auto-logs the data, leverages some advanced statistical algorithms the team has created and then intelligently surfaces the model’s failure points with actions and integrations to immediately fix them, all within one platform. This short circuits the time taken to proactively find critical errors in ML data across training and production models from weeks today to minutes with Galileo.
Galileo goes a step further by acting as a collaborative system of record for the data scientist’s training runs, bringing transparency towards how specific data and model parameter changes impact overall performance – this is key for ML teams to truly be data-driven.
“The motivation for Galileo came from our personal experiences at Apple, Google and Uber AI and from conversations with hundreds of ML teams working with unstructured data where we noticed that, while they have a long list of model-focused MLOps tools to choose from, the biggest bottleneck and time sink for high quality ML is always around fixing the data they work with. This is critical, but prohibitively manual, ad-hoc and slow, leading to poor model predictions and avoidable model biases creeping into production for the business,” said Vikram Chatterji, co-founder and CEO of Galileo. “With unstructured data across the enterprise being generated at an unprecedented scale and now rapidly leveraged for ML, we are building Galileo with the goal of being the intelligent data bench for data scientists to systematically and quickly inspect, fix and track their ML data in one place.”
Galileo Founded by Engineering Leaders from Apple Google and Uber AI
The co-founding team at Galileo spent more than a decade building ML products where they faced the huge challenges that ML with unstructured data present first-hand.
- Vikram Chatterji (CEO) led product management at Google AI where his team worked on building models with unstructured data but spent weeks analyzing the data across the ML workflow, often using Google sheets and scripts. This was a massive under-utilization of an expensive resource (the data scientist) and led to poor model outcomes due to ad-hoc tooling.
- Atindriyo Sanyal (CTO) led engineering at Uber AI (Michelangelo) where he was a co-architect of Uber’s feature store and spearheaded company-wide ML data quality tooling, leading to huge prediction performance improvements across thousands of production models. He was also an early member of the Siri team at Apple where he built foundational infrastructure for better ML data management.
- Yash Sheth (VP of Engineering) led the speech recognition platform team at Google. He was instrumental in growing speech recognition 800x across more than 20 consumer products at Google and across thousands of businesses globally through their cloud speech API.
Galileo Focused on Data-Driven ML Research
Half of the Galileo team comprises researchers from Apple, Google and Stanford AI who are focused on pushing the envelope of data-centric research that is then baked into the Galileo platform for any ML team to leverage. The other half of the team is focused on building novel systems that can perform extremely low latency in-memory computations on millions of data points using minimal system resources. This combination allows Galileo customers to get quick, intelligent data insights throughout the entire ML workflow.
Galileo Raises $5.1 Million in Seed Funding
Galileo also announced that it has raised $5.1 million in seed funding. The Factory led the round and Anthony Goldbloom (co-founder and CEO at Kaggle) and other angel investors also participated. Company advisers include Amy Chang (Disney, P&G board member) and Pete Warden (one of the creators of TensorFlow).
“Finding and fixing data errors is one of the biggest impediments for effective ML across the enterprise. The founders of Galileo felt this pain themselves while leading ML products at Apple, Google and Uber,” said Andy Jacques, investor at The Factory and Galileo board member. “Galileo has built an incredible team, made product innovations across the stack and created a first of its kind ML data intelligence platform. It has been exciting to see rapid market adoption and positive reactions with one of the customers even calling the product ‘magic’!”
The company plans to use the funding to hire across all departments and accelerate research and development to meet the demand of the industry for a purpose-built product to find and fix ML data blind spots across the workflow while working with unstructured data.
To read Chatterji, Sanyal and Sheth’s blog on ML data intelligence, simply go to: https://www.rungalileo.io/blog/introducing-ml-data-intelligence
Sign up for the free insideAI News newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1