Incomplete Reporting Data
Incident Report for Sparkcentral
Resolved
We have completed our investigation and verification of the backfilled data. These are the results, reasons why it happens, and steps we take to prevent similar incidents in the future:

Our reporting system has processors that read messages that flow into Sparkcentral as an event stream. Unfortunately, our reporting system lost connection to the event stream and stopped processing incoming messages into reporting metrics. This situation was expected to be a recoverable event. However, we found that the reporting processors did not continue processing where it last left off as expected but instead skipped processing these unread events. In an effort to recover and be able to report on the messages during the outage Engineering manually replayed the events that were missed. This provided only a partial fix where the status of the conversations was different between the time of the replay and when the messages originally where received. A subset of conversations between 19th Feb and 5th March are impacted. However the number of such conversations is high enough that the impact on hourly average metrics is +/-10% and in some one-hour slots +/- 20%. Therefore the statistics are inaccurate within the 14 day time period.

In order to prevent this occurring in the future, we are taking the following steps:
- We've reconfigured the reporting processors to properly pick up where they last left off if a disconnect should occur.
- We are changing our reporting system to make the replay of events deterministic.
Posted Mar 30, 2018 - 00:57 PDT
Update
The maintenance/backfill has been performed successfully. Before resolving this incident, however, the backfilled data still needs to be verified. Until then, the reporting data from Feb 26, (approx.) 04:00 - 15:00 PST should still be viewed as incomplete.
Posted Mar 06, 2018 - 02:59 PST
Update
We will be performing an unscheduled database maintenance, which will include backfilling for this data, today at 5pm PT. We anticipate it lasting several hours. While we're going through this backfill, some reporting features may be slow or inaccessible. Instagram and Sparkcentral Messenger channels may also experience some latency. We will let you know when this maintenance is complete. Thank you for your understanding.
Posted Mar 05, 2018 - 15:25 PST
Identified
There is currently missing reporting data for Facebook and Twitter for the period of Feb 26, (approx.) 04:00 - 15:00 PST. All reporting metrics are affected.
Posted Mar 01, 2018 - 11:57 PST