Always Prototype, Always Learn: Part Two
Always Prototype, Always Learn: Part Two
Where We Left Off
In Part One, we spent months building an ML model from scratch with our intern Megan. We dealt with messy data, 200-300% engagement rate outliers, and an EDA doom loop. We got our model to R² = 0.72—not perfect, but a start.
Then Megan's internship ended. The project could have died there.
Enter Levi.
A New Perspective
Levi picked up where Megan left off, but he brought a different angle. Instead of deploying what we'd built, he asked: "What if we tried AutoML? What if we let the system explore more algorithms and hyperparameters than we could manually?"
We had a working model, but Levi's suggestion made us pause. Maybe our manual approach wasn't the only option.
AutoML: Starting Over
Levi introduced us to AutoML, and we made the decision to re-run the entire training process with automated machine learning.
Training Comparison
What changed:
- Manual: 8 weeks of feature engineering + manual model selection
- AutoML: 2 weeks of automated process + systematic exploration
Results comparison:
- Manual model: R² = 0.72
- AutoML model: R² = 0.81
The AutoML approach performed better with significantly less manual effort. It explored algorithms we hadn't considered and tuned hyperparameters more systematically than we could have manually.
Key Insight
This was a notable moment. We'd spent months manually tuning a model, and AutoML achieved better results in a fraction of the time. The learning wasn't just about building a model—it was about knowing when to use the right tools.
Getting it to staging
With the AutoML model validated, we moved to deployment.
Deployment Timeline
| Phase | Duration | Lead |
|---|---|---|
| Feature Brainstorming | 2 weeks | Megan |
| Data Cleaning & EDA | 8 weeks | Megan |
| Model Training (Manual) | 6 weeks | Megan |
| Handoff & Code Review | 1 week | Megan + Levi |
| AutoML Re-training | 2 weeks | Levi |
| Kubernetes Deployment | 4 weeks | Levi |
| UI Development | 3 weeks | Levi |
| Staging Deployment | 2 weeks | Levi |
From Notebook to Staging Server
With Levi leading the deployment work, we focused on the deployment fundamentals:
- Kubernetes Deployment: Containerizing the model with proper resource allocation
- Staging Environment: Setting up a safe space to test before production
- CI/CD Pipeline: Automating the build and deployment process
- Model Registry: Versioning our models for traceability
Not in Production Yet
This is important to be honest about: we're not in production yet, and that's okay. The value is in the learning, the process, and proving that we can get from idea to deployed API—even if the UI is still being polished.
Key Learnings
Data is never ready
Even with a working ETL pipeline, real-world data is messy. We learned to expect the unexpected and budget time for iterative cleaning. If you're waiting for perfect data, you'll never start.
The Data Journey
| Phase | Rows | Clean Status | Notes |
|---|---|---|---|
| Raw Source | 10,000,000 | Dirty | Multiple tables, unmerged |
| Post-ETL | 8,500,000 | Dirty-ish | Basic cleaning done |
| EDA Round 1 | 7,200,000 | Dirty | Null values removed |
| EDA Round 2 | 5,800,000 | Still Dirty | Outliers filtered |
| EDA Round 3 | 2,100,000 | Medium | Joins fixed |
| MVP Dataset | 400,000 | Clean Enough | Ready for modeling |
Start small, start now
We didn't have perfect data, but we had enough data. The learning came from the process, not from a perfect deployment. Start where you are.
Handoffs matter
Megan built a strong foundation, and Levi took it to the finish line. Clear documentation and code reviews made the transition smooth. This reinforced that projects don't have to finish with the same person who started them.
Fresh perspectives are valuable
Levi didn't just continue what Megan started—he challenged it. The decision to re-run with AutoML was uncomfortable but ultimately led to better results. Being open to new approaches is a skill in itself.
Experience matters when teaching
Years of experience with Kubernetes and infrastructure helped guide the interns through deployment challenges they hadn't seen before. Teaching through doing is powerful.
Iteration is everything
Our biggest wins came not from the first attempt, but from the ability to go back, fix, and iterate. This mindset became invaluable across all our projects.
Business context is everything
Megan had the ML theory, but we had the business context. That combination was powerful. We didn't just build a model—we built something that could theoretically answer business questions.
The last mile is where learning happens
Deploying a model on Kubernetes and building a UI taught us more than the modeling itself. Getting something into users' hands—even if it's just an API on staging—is where the real learning happens.
The Takeaway
This project wasn't about creating the world's best ML model. It was about creating a sandbox for continuous learning—and proving that you can take something from idea to deployed API, even if it takes multiple interns, multiple approaches, and six months of persistence.
We're not in production yet. We're still tidying up the UI. But we have a staging API that works, and we are excited to take this even further to showcase and test predictions accuracies.
That's a win.
For anyone looking to upskill in machine learning: find a real project, get your hands dirty with messy data, and embrace the iterative process—even if that means starting over with a better approach. The learning happens in the spaces between perfect and good enough.
Acknowledgements
Strategic Visionaries The business leaders who recognized that learning and prototyping create long-term value.
Domain Experts The data engineers and analysts who understand the real-world messiness of production data.
The Engine Room Megan, the first intern who laid the foundation and did the heavy lifting on data preparation and model training.
Levi, the second intern who picked up the baton, challenged our approach, and introduced AutoML that ultimately delivered better results.
And me, for mandating that we always have a sandbox to learn in—and for bringing the Kubernetes experience to teach along the way.
[Read Part One: The Early Days of Building the Model]
This article is part of Wired Sixth's "Intel" series on data strategy and engineering. For more on building custom intelligence tools, contact us at hello@wiredsixth.com.