Updated: Dec 22, 2022
It’s a fair question. It’s probably the most common one we get asked, as it should be. We built a product that can appear like smoke and mirrors: we come in and snap some pictures, then report back with a grade for each hive. All the image processing, quality control and data analysis work takes place behind the scenes. So, how do we prove that Verifli is accurate?
Data collection is the simple answer. But the simple answer isn’t satisfying and it fails to capture the effort and complexity involved. Let’s peel back the curtain and take a look at what goes in to building the Verifli hive strength model, and how our methods progressed over time.
Early beginnings of data collection
We took our first infrared hive picture in June 2018. Back then, we had A LOT of unknown factors to test, like what camera to use, how far to stand from the hive, what time of day to take images, how much data will we need, etc. At the time, we had about 25 hives that we managed for research purposes. These hives would become the first subjects of our data collection.
We’d take an image of each hive, then perform a frame count to determine the true colony size. Frame counts are a quick and easy way to get a rough colony size measurement—inspectors peek inside the hive and estimate how many frames are covered by bees. After processing the images, our model would generate an estimated colony size and we’d check the results against the frame count to determine whether the model was accurate. At first, the model performed surprisingly well—that is, until we tried imaging someone else’s hives.
Since we started by only testing on our research hives, the initial model was great at predicting the strength of those 25 hives. But beyond those 25 hives, the model was all over the place. We quickly realized that we’d need a much more diverse set of data to understand how the model performs on hives we hadn’t previously imaged. So we looked to our network of local beekeepers for help. Through the end of 2018, we’d drive out 4-5 times each month to collect hive data from different Indiana beekeepers.
Evolving our approach
In 2019, we began dedicating more time and effort into collecting quality data. We started travelling outside of Indiana to visit commercial beekeepers and collect data from their hives. We’d spend about a week with each beekeeper, and we could expect to collect data from 300 hives each day. We were rapidly bringing in new data, and things were looking up.
We made improvements to the process to reduce errors in the field and cut down on data entry time. We built a “data collection mode” in our mobile app to attach frame count data to each image file before they’re uploaded to our analysis pipeline. We performed a handful of one-off tests to refine our best practices for image capture.
Despite all the time spent re-designing our data collection methods, model accuracy was still well below our target. There was still something missing. We had set out to build a model that could objectively measure colony strength with greater accuracy than frame counts. But by comparing Verifli predictions against frame counts, the model’s potential accuracy was limited to being only as accurate as frame counts.
We needed a way to determine true colony population to understand not only the accuracy of Verifli, but also the accuracy of frame counts, which would give us a benchmark to compare against.
Reaching a breakthrough
We made the most impactful overhaul to our data collection process in 2020 by collecting additional data points and developing more precise ways to measure colony size. We added new sensors to our arsenal, giving us redundant readings to validate thermal values captured by the IR camera.
To precisely measure colony size, our solution was to collect bee weight (by weight I mean mass, if you care about semantics). If you’ve ever done frame counts, you know that bees can “hide” from an inspector by spreading out or clustering in an unexpected spot. Deceiving the human eye is relatively easy. Deceiving a scale? Good luck.
But weighing bees doesn’t come without challenges. Our first method of measuring bee mass would sometimes result in negative values. It wasn’t that we’d discovered that bees can have a negative mass (though it would be pretty cool if we had). No, it was our method that was overly complex and error-prone. To refine our approach to collecting bee weight, we sought guidance from William Meikle at the USDA Bee Lab in Tucson, AZ. After visiting William’s lab, we finally found a method that produced accurate values.
Though it takes more time and physical effort than simply counting bees on frames, collecting bee weight was the key to reaching a breakthrough. Now, rather than assuming frame counts were always accurate, we had a way to measure just how accurate they are. When compared to bee weight, we found that the model's predictions were just as accurate as frame counts on average, and in some cases more accurate than frame counts.
How we do it today
Today, our data collection team (led by Kate, Kerstin and I) spends about a week each month traveling to different corners of the country to visit the generous beekeepers who allow us to gather data from their hives. Some of the places we visit may surprise you—like Alaska, Arizona and North Dakota—but each location is deliberately selected to test the model in specific weather conditions. We’ve made several adjustments to simplify our method of collecting bee weight. We continue to perform a handful of one-off experiments each trip, with the purpose of proving (or disproving) our assumptions and refining our best practices.
From our first IR picture in 2018 to today, we’ve collected data from tens of thousands of hives across 8 US states (plus Australia!). We’ve visited bee yards at sea level and at over 4,000 feet of elevation. We’ve captured images in single-digit weather and we’ve worked hives at temps over 105. We’ve spotted 2 moose, a herd of pronghorn antelope, desert lizards, and every farm animal you can imagine. I pulled an all-nighter deep in the woods of southern Indiana, taking pictures from sunset to sunrise (I did that twice actually).
Data collection has taken us to many interesting places and exposed us to a variety of different methods and philosophies of beekeeping. Data collection trips are our opportunity to improve Verifli, but they’ve been an opportunity for our team to improve how we design experiments and develop best practices.
This was a fun post to work on because it captures the struggles of operating a startup. You create assumptions based on the information available to you, then try to achieve as much progress as possible. As soon as you gain access to better information, you refine your assumptions and adapt.
Looping back to the question at the top, “How do we confirm that Verifli is accurate?” The answer is 4 years of heavy lifting, long days and nights, challenging assumptions, seeking expert advice, thinking outside-the-box and dodging bee stings. The question can only be fully answered with the 4-year story, a story that’s still being written.