Implementing In-Place OTA Updates with Zero Downtime
August 17, 2020
OTA updates are not a new concept. Not even in the automotive industry. It began with the naïve and straight forward approach of in-place updates–which means that the update overwrites the existing version of software with a new one. This required the device/platform to reboot into ‘recovery’ mode which resulted in significant downtime. This was an easier approach to use when the file size was small.
We then moved on to the not-in-place (NIP) update using dual banking. The rationale to move to this approach included:
- Redundancy for safety purposes
- Almost zero* downtime of the device/platform
- Cybersecurity enhancements as the inactive bank (i.e., bank ‘B’) is not activated unless it has been validated by signature thus protecting it against malicious software being installed on the device/platform
- Higher focus on safety and robustness called: ‘Fail Safe.’
However, this introduced an additional cost to the OEM, as double the memory space was required increasing the BOM of each ECU that supports dual-banking, while keeping resource-constrained ECUs in the former in-place update and respective downtime.
To reduce costs while enhancing the user experience by providing zero* downtime and maintaining safety and cybersecurity means and capabilities, an in-place zero-downtime need arose.
So, is it feasible?
The short answer is “yes.” Various types and mechanisms of virtual-file-system (VFS) already exist from a technological point of view. The most known usage of VFS is in Android’s Device Mapper. The VFS externalizes unified plain-text content to the user out of encrypted data in scattered physical blocks and this entire operation is transparent to the user. Another example is the UBI (unsorted block images) which is a lower-level implementation of VFS–a virtual to physical memory manager. It is the driver that manages the physical memory, including formatting, while presenting a virtual volume to the upper layers.
However, there are a few key aspects that need to be factored in when addressing the desire of zero downtime in-place update: security, safety, reliability, compression, and maturity. Let’s look at each of these by analyzing in-place updates with zero downtime.
Safety & Reliability
As cars advance into autonomous/automated driving, safety becomes a key pillar that always needs to be maintained at an adequate level. A redundancy mechanism, such as dual banking or a MooN (M out of N) scheme are then put in place. Memory manipulation while the ECU is running is a safety critical process that needs to be robust enough to handle failsafe scenarios. An example might be an unexpected power outage and a reboot of the ECU being updated at the same time which poses the need to continue from the last update point and not re-run the entire update flow from the beginning. However, will an in-place update process be compliant with such safety requirements and regulations without having a failsafe ability?
The connectivity of our car has enabled quite a few helpful features. Along with fixing defects at home and a handful of benefits such as extended services and vehicle capabilities - simply by subscribing from the car or our mobile device to a new service. Along with the good, occasionally comes the bad with the introduction of a new attack surface for hackers to exploit.
These attacks forced the OEMs to start taking protective and preemptive measures, to secure both the driver and the content of the vehicle’s software–both from malicious hackers and against IP theft. One such measure is encryption–the OEMs started to save, and even send the differential update package for each software in an encrypted manner. The various software images are also kept in this same way on the car itself.
Advanced Operating Systems (OSs), such as Android, keep their images encrypted on the device. The user gets the uncrypted data through a dedicated mechanism that keeps the data encrypted on the device while handing it unencrypted to the user. Such OSs cannot be updated by directly accessing their physical memory, unless there’s an on-board entity that triggers the OS to generate an uncrypted mapper file.
This then presented a dilemma in the in-place zero* downtime requirement – what is more valuable/ important: zero downtime that uses memory manipulation and thus cannot operate on an encrypted image or keeping the encryption but “paying” with downtime that is required for the decryption-update-encryption flow. In other words, what is more valuable – the end-user’s time or the OEM’s IP?
Signature validation is another means of security that ensures that the correct software image & update are being used in the update flow. Using low-level VFS in currently used update algorithms might expedite the device’s update process a bit but are so low-level that it’s nearly impossible for these algorithms to verify the signature before every read or write commands. This means that this low-level VFS can be treated as an attack surface by hackers, that can alter the original data so that malicious data is written to the FLASH drive. This presents another dilemma for us in assessing which option is more valuable and important. Is it 1) zero downtime that uses memory manipulation on a Remote ECU and thus the updater cannot verify the integrity of each memory-segment on the recipient ECU (as it does not have an updater-client on the recipient ECU), or 2) integrate such an update-client that can validate each piece of data received prior to applying it?
When coming to decide on your differential update solution you need to look for both experience and maturity. This solution needs to have numerous vehicles already being updated by it and selected by the biggest automotive manufacturer in the market. Importantly it needs to be up, running and managing the updates for tens of millions of vehicles.
HARMAN Smart Delta zero downtime approach
HARMAN zero* downtime solution utilizes the platforms capability of Logical-to-Physical (L2P) addressing, much like the one used by Linux’s & Android’s ‘Device Mapper’, in the following manner:
- The data is not written on the active storage but to a different available storage. This allows it to preserve the source image throughout the update process.
- After all the new data has been written and map information (of which data is written where) is created, the platform reboots and applies the new mapping. For example, if the data is written to flash blocks 0-100 and only block 6 has changed, the new block 6 is written elsewhere (for example block 101) and after the update process the mapping layer will point all reads from block 6 to block 101.
- To prevent exponential storage growth, once a ‘rollback’ is marked as ‘not required,’ the HARMAN solution uses the L2P mechanism to overcome the unnecessary growth. While the platform uses the new version through the mapper, the HARMAN solution copies the new data back to the original storage (which is now unused), and during the next reboot deletes the content of the extra storage used in the update phase.
- By using the L2P mechanism, the HARMAN solution ensures that its bit-exact update capability and its signature verification capabilities–for the source, the update package, and the target are maintained.
- When operating in an Android environment, the utilization of the operating systems ‘Device Mapper’ allows it to maintain its encryption/uncryption/decryption mechanism, thus maintaining the image’s security and safety.
- The HARMAN modus operandi maintains the platforms safety as it operates with an untouched image, while the update is done in the background and is taken into effect only after the update process has ended successfully and securely.
With the HARMAN E2E OTA solution, including its best-in-class Smart Delta technology, the security, safety, reliability, and user-experience are all resolved and continuously updated when supporting in-place updates with zero* downtime.
*zero downtime – except for a short and regular reboot that is needed to have the new target take into effect, once the update is complete.