Michael Ger, managing director, manufacturing and automotive, Cloudera, explains why the data-intensive processes behind autonomous vehicle development will light the way for future ML use cases.
There’s no shortage of reasons why autonomous vehicle development is a hotly debated subject. The idea of dispensing with a driver behind the wheel marks one of the most radical concepts of the robotic era. It doesn’t just require the right technology to succeed but also a major change in mindsets from all of those who use the road and who are involved in road and traffic management. Safety is, of course, the major concern, which is why when an incident happens it makes headlines.
Behind the debate and news headlines, though, is another story that is often overlooked: that the successful integration of autonomous vehicles into cities and society heavily relies on data. Indeed, data collected from autonomous test vehicles provides the foundational building blocks to “train” vehicles to perform autonomously via technologies such as machine learning (ML).
Given the sheer number of real-world variables an operating vehicle is exposed to and the associated zero-tolerance for error safety requirements, autonomous driving is among the most challenging machine learning use cases imaginable. Success in this application will ensure the pursuit of thousands of less demanding use cases, which is why the development of autonomous cars has implications across many different sectors, particularly smart cities.
From an autonomous vehicle perspective, the key machine learning requirement involves training the “perception layer”, which means using sensor (telematics, camera, lidar radar, inertial measurement units and more) to accurately “see” the conditions the vehicle is encountering. This is fundamental, since any actions taken – such as instructing the vehicle to make path adjustments – will be contingent on accurate perception layer vision.
One of the reasons that autonomous driving development is likely to drive future uses cases is because the machine learning models and neural networks that train this vital perception layer perform best on large datasets of great variety. And autonomous vehicles rely on massive datasets. Indeed, creating one is a petascale endeavour. Yes, it also relies on classical automotive engineering expertise, but the average amount of data needed to build an autonomous vehicle is estimated to be around 150 petabytes. In short, it is as much a data analytics and machine learning challenge as a mechanical one.
The average amount of data needed to build an autonomous vehicle is estimated to be around 150 petabytes
The volumes of data that need to be collected and processed requires advanced data management capabilities, involving data lakes and a clear understanding of the data lifecycle. Future use cases are dependent on not just an understanding of managing and processing the data but also on the opportunities this data can bring.
Historically, fragmented data management lifecycles have limited the ability to advance new use cases due to the effort, cost and time associated with the managing the lifecycle itself. By optimising the lifecycle, it can be re-iterated more quickly and frequently, in turn providing continuous improvement of machine learning models.
To make this happen, automakers, cities and other stakeholders must work together and leverage the latest hardware and software technologies in what is a rapidly evolving environment. The capabilities required to master the Internet of Things and the machine learning data analytics lifecycle extend beyond the realm of any one company. As a result, a standards and partner ecosystem-based approach is essential to corralling the capabilities to truly transform smart cities and connected communities.
This level of working together is critical to establishing solutions, as joint projects result in both standards and reusable templates. As a recent example, Cloudera has engaged in an initiative called Project Fusion, a multi-party automotive industry technology collaboration to define a data lifecycle platform for enabling and optimising future connected and autonomous vehicle systems. The partners are aiming to build a vehicle-to-cloud solution that provides a data management technology.
Working collectively will also ensure some of the other barriers to maximising the use of big data and machine learning in autonomous vehicle development and other use cases can be addressed. Waste and inefficiencies need to be taken out of the system to reduce the cost and time of managing the lifecycle.
Capabilities required to master the IoT and the machine learning data analytics lifecycle extend beyond the realm of any one company
Crucially, potential data privacy issues must be confronted. As previously mentioned, training autonomous vehicles to drive themselves relies on training data recorded in the real world. As a result, solution providers must take care not to collect and store private information such as drivers’ faces and licence plate numbers.
Capabilities must be provided to redact this information prior to being collected and stored. This requires significant data processing capabilities to recognise and filter data “in-stream”. In addition, any information collected must adhere to regulations such as General Data Protection Regulation and the California Consumer Privacy Act.
Machine learning is vital to helping cities, technology vendors and other stakeholders move beyond simply monitoring and reporting on data from sensors and other devices to making optimised, real-time decisions based on it. Take transportation as an example.
It is one thing to monitor traffic conditions and report the fact that congestion is occurring but an altogether more enticing value proposition to utilise machine learning to proactively guide citizens via recommendations, such as suggesting alternative routes or proactively advising travellers to travel at a different time. Using machine learning, these can be based both on real-time conditions and learnings from the past.
What we do know is that autonomous driving can teach us a lot about machine learning’s potential and lead us towards many new applications
Proactive, optimised and real-time decision-making is the hallmark of the benefits of machine learning and we are only at the start of the journey. We still have a lot to learn about the potential of machine learning and many of its future use cases are beyond our current imagination.
What we do know is that autonomous driving can teach us a lot about its potential and lead us towards many new applications. And what we need to ensure is that the foundations and ecosystems are in place to understand the importance of advanced data management and data lifecycles so that no opportunity is missed.
You might also like:
in association with: