Last week we started covering the story behind Exposure, the event analytics system processing big data in real time. It is one of the most notable products we’ve been working on lately and it has already been used on such notable events as the International Confex in London and the Mobile World Congress in Barcelona.
Today we continue our story. As you remember, we released the MVP, but it was decided to split the solution into two products and further improve them.
Refining the Technology
While splitting the system into two different products didn’t seem to be a complicated task, making the system quicker and more stable in case it was necessary to handle large amounts of data was a challenge for our RubyGarage team. For the next year of development, we performed three major improvements to allow Exposure to rapidly process much larger amounts of data.
At the beginning of the project, it was decided to keep the data collected from sensors in the PostgreSQL database. Not only did it offer higher speed, but it was also more flexible and overall fitted better for the required tasks in comparison to MySQL which was used on Jeremy’s servers.
However, the more data we needed to process, the less productive PostgreSQL was in that regard. So we offered to look into alternatives for Jeremy. Since MongoDB didn’t offer a significant increase in speed, only Elasticsearch seemed promising after initial research.
At that time Elasticsearch wasn’t as popular as it is now, and we realized there was no domain-specific language for creating requests to its database. Now you can use the full-fledged Query DSL based on JSON for that matter, but in the middle of 2015, we were forced to write our own domain-specific language in Ruby for sending requests to Elasticsearch database. Otherwise, we would have had a hard time building those JSON requests manually for each client/event.
Elasticsearch is great when it comes to data processing, but the more data you store there, the more time it requires to search for and process it. We decided to use Elasticsearch only when we needed to analyze the statistics, while PostgreSQL would always keep the raw data.
The main advantage of Elasticsearch is that it is extremely efficient thanks to its scalability and distributable nature. Using its load balancing features allowed us to perfectly optimize the Elasticsearch cluster resources and get the best performance for each given event no matter how long or big it was.
2. Improving Server Infrastructure
At first, we were using Digital Ocean virtual servers, and they were just fine despite offering limited functionality. But when in October 2014 we were asked to service an event with over 50,000 visitors, the amount of data became excessive, and the performance of virtual machines turned out to be the bottleneck in the system.
Due to our modular approach to building Exposure we were able to migrate it to other platforms easily. At first, we switched to dedicated IBM Softlayer bare metal servers recommended by Jeremy. Eventually, the platform used 11 servers to ensure the smooth performance of Exposure. Besides, more control over the computing resources and its functionality had also significantly increased the effectiveness and speed of the system.
However, with the implementation of Elasticsearch, we felt the need to have a flexible infrastructure that would allow us to increase or decrease the number of servers for quick data processing on the go. So during the summer and autumn of 2015, we successfully redeployed Exposure to Amazon’s AWS and immediately saw a dramatic increase in the performance and flexibility of data processing algorithms.
3. Changing Data Retrieval Approach
The basic scheme that we used from the very beginning involved additional servers that would collect the data from sensors and keep it in their own MySQL database. Such an approach has its particular inconveniences.
First, the data was transferred to the MySQL database (which wasn’t as quick) and then to our PostgreSQL. Obviously, the MySQL database was a redundant link in the chain, not to mention the fact that MySQL was overall performing more slowly than PostgreSQL.
Second, the company owning those data-storing servers had its own limitations on the amount of data it allowed us to store and the time frame that the data could be accessed. That was a significant problem since Jeremy expected that the gathered data would become even more valuable for clients over time as with it they would be able to compare more stats and make more precise assumptions about their new hypotheses. So we decided to create a new, better performing system that would collect data from sensors and send it directly to PostgreSQL.
With that implemented we have managed to deploy a new version of the Exposure system that is capable of working on events of any scale, flexible enough to bring the desired performance and is extremely stable, collecting even the largest amounts of data very quickly.
And we’re stating that not only because the improvements were aimed at that, but also because Jeremy Rollinson was able to test the new capabilities at a number of other events just like he did it at Conflex: he asked for some space at an event in exchange for analytics for event organizers. In the end, we would get real feedback from real clients, which also helped us to move forward in terms of functionality and business perspective.
Refining the Business Model
Splitting the product into Exposure for event organizers and Exposure for sponsors required us to determine the sets of features we could offer in each case.
As previously mentioned, sponsors would usually have a stand, a banner or a tent on the event, so only one sensor would be required for the whole system to provide the analytics. Since the system was analyzing the experience similar to the experience of website visitors — they come, they study, they interact, and they eventually go — we assumed that the analytics should also provide the same data:
- how many people visit a particular stand or see a banner,
- how much time they spend learning about the product or the ad (‘dwell time’),
- how many people come to a stand and engage,
- what stands, if there were a few, did they interact with most of all.
To check this approach the product was tested at an outdoor music festival in Atlanta. Despite a huge thunderstorm, which caused the hardware to fail and forced the Forge team to build redundant cache servers to ensure no data would be lost, it was a successful experience, since the updated version of Exposure managed to accurately calculate useful information for a single stand and count engaged people.
Exposure analytics showing the ratio of engaged vs. passing visitors
Here we should explain that the person is considered to be engaged if they spend no less than a particular amount of time at the set distance in front of an activation (that is a banner, a stand or a tent). For instance, we could say that a person is engaged if they stay for more than 30 seconds not more than 3 meters away from the required stand.
To realize such functionality, we would usually ask Jeremy or some of his colleagues to stand in front of the activation so that we could read the positioning data from a sensor and set the system to count it as an engaged person.
So, this is how it was decided to offer Exposure to two different target audiences:
Large Event Analytics
for event organizers
— Reveals the visitors’ flow
— Provides most popular areas as heatmaps
— Shows summary reports about average dwell time, mobile device breakdown, etc.
Experiential Events Analytics
for brand owners and sponsors
— Counts passing traffic
— Calculates dwell time
— Provides engagement data
A real dashboard screenshot of a premium fashion label that was advertising its product via popup activations in major shopping destinations during May 2015 in the UK.
The data on the number of unique visitors and average dwell time allows calculation of the total brand exposure during that day, which is equal 860 hours.
The Experiential Events Analytics was also very impressive for Honda who used it at a Frankfurt Show in 2015. Jeremy was partnering with Avantgarde UK, an experiential marketing agency that wanted to measure which vehicle sections were the most popular and the most engaging, what the average visit and dwell times were and so on.
For Honda the data they retrieved turned out to be very important, as at such shows each minute spent on your stand means one less minute spent on other stands (say, the Toyota one). In March 2016 Honda used those stats to be represented even more effectively at the 86th Geneva International Motor Show.
In the end of January Jeremy had installed the Wi-Fi sensors at the Mobile World Congress 2016 in Barcelona to analyze the usage of mobile devices. After collecting the results he was surprised to find out that the Apple device share dropped by 21,5%, while Asus, used by only a few people in 2015, had seen a huge 6000% share increase. This time Jeremy Rollinson supported this data with sales reports, but who knows, maybe such reports made with Exposure will have its own monetizable value soon.
The decision to split the Exposure solution in two products was made in early 2015, and a year has passed already. Of course, the product has evolved even more since then, but that’s a subject for another blog post we’ll publish next week. Stay tuned!