Yousign V2 SMS incident
Incident Report for Yousign
Postmortem

Incident Summary

On November 29, 2024, between 18:18 CET and 19:25 CET, the SMS capabilities of our V2 services were disrupted, preventing customers from completing document signatures using SMS security codes.

Importantly, V3 services were not affected by this incident. As previously announced, V2 services are nearing their end-of-life (EOL), which may have contributed to specific limitations.

By 19:25 CET, the issue was resolved, and all functionalities were fully restored.

We want to provide you with a clear and detailed account of the SMS service degradation affecting our V2 services on November 29, 2024. This communication reflects our commitment to transparency and continuous improvement.

We sincerely apologize for any inconvenience caused to you or your customers.

Summary

Our V2 services encountered a degradation leading to the loss of SMS capabilities on 2024-11-29 between 18:18 CET and 19:25 CET. Please note that v3 was not affected by this incident. Only v2, for which the end-of-life announcement was communicated, was affected.

During this time-frame, customers using SMS security code weren’t able to complete their documents signature.

V2 SMS capabilities were restored starting 19:25 CET, and therefore the ability to fully use our V2 services.

Given our principle of transparency and continuous improvement, this Post-Mortem aims to explain what caused the issue, how we reacted to the incident, and what we plan to prevent another incident in the future.

We would like to apologize for the impacts on you and your customers.

Chronology

18:18 CET : Cold analysis shows us that the asynchronous notification process (in charge of processing the SMS) start piling up at this time.

18:21 CET : Issue raised by our Care team.

Our cold analysis show us that the default in our monitoring process leads to delay between the Care team information and the Engineering team intervention. Indeed, threshold alert functioned correctly but did not trigger an oncall.

18:59 CET : Escalation raised to Engineering teams.

19:25 CET : End of incident, all services operational.

What Happened

The issue stemmed from a malfunction in one of our processing workers. While the processes appeared active, they were no longer effectively handling new events, leading to a backlog in SMS notifications.

Additionally, a gap in our monitoring systems prevented early detection of this specific failure mode, delaying our response.

Remedies & Fix

Our technical teams will work actively on several axes :

  • Worker Investigation: As v2 is at the end-of-life stage, we won't be making any improvements in this area, apart from continuing to migrate our customers to v3.
  • Enhanced Monitoring: Our monitoring systems will be extended to proactively detect similar issues in their early stages, ensuring a faster response.
  • Incident Review: We will review and update our escalation protocols to improve coordination across teams during incidents.
Posted Dec 03, 2024 - 15:22 CET

Resolved
We identified an issue with our notifications workflow that affected our SMS notifications for all V2 customers from 2024-11-29 18:18 CET to 2024-11-29 19:25 CET.

A Postmortem will be provided in the upcoming days to provide more details.

Impacted applications:
- Yousign V2 - API V2 - https://api.yousign.com
- Yousign V2 - APP V2 - https://webapp.yousign.com
Posted Nov 29, 2024 - 18:30 CET