Microsoft AI Researchers Accidentally Expose 38 Terabytes of Confidential Data

Cyber Security Threat Summary:
Microsoft on Monday said it took steps to correct a glaring security gaffe that led to the exposure of 38 terabytes of private data. The leak was discovered on the company's AI GitHub repository and is said to have been inadvertently made public when publishing a bucket of open-source training data, Wiz said. It also included a disk backup of two former employees' workstations containing secrets, keys, passwords, and over 30,000 internal Teams messages. The repository, named ‘robust-models-transfer,’ is no longer accessible. Prior to its takedown, it featured source code and machine learning models pertaining to a 2020 research paper titled "Do Adversarially Robust ImageNet Models Transfer Better?" ‘The exposure came as the result of an overly permissive SAS token – an Azure feature that allows users to share data in a manner that is both hard to track and hard to revoke,’ Wiz said in a report. The issue was reported to Microsoft on June 22, 2023. Specifically, the repository's file instructed developers to download the models from an Azure Storage URL that accidentally also granted access to the entire storage account, thereby exposing additional private data. ‘In addition to the overly permissive access scope, the token was also misconfigured to allow "full control" permissions instead of read-only,’ Wiz researchers Hillai Ben-Sasson and Ronny Greenberg said. ‘Meaning, not only could an attacker view all the files in the storage account, but they could delete and overwrite existing files as well’”(The Hacker News, 2023).

Security Officer Comments:
Microsoft typically conducts historical rescans of all public repositories in Microsoft-owned or affiliated organizations and accounts. Although Microsoft’s scanning system was able to detect the specific SAS URL that was identified in the ‘robust-models-transfer’ repository, the vendor noted that the finding was flagged incorrectly as a false positive. Microsoft has since fixed the issue and confirmed that the system is now properly reporting on all over-provisioned SAS tokens.

Suggested Correction(s):
The SAS token has been revoked with Microsoft working with research and engineering teams to prevent all external access to the impacted storage account. As for the impact, Microsoft stated that no customer data was exposed and that no other internal services were put at risk. As such no customer action is required. However, this incident should be seen as a lesson for organizations to take caution when creating and handling SAS tokens, as one small mistake could allow actors to get access to storage resources that may contain a treasure trove of data that could be misused for illicit purposes.

Azure Storage recommends the following Best Practices when working with SAS URLs:

  • Apply the Principle of Least Privilege: Scope SAS URLs to the smallest set of resources required by clients (e.g. a single blob), and limit permissions to only those needed by the application (e.g. read-only, write-only).
  • Use Short-Lived SAS: Always use a near-term expiration time when creating a SAS, and have clients request new SAS URLs when needed. Azure Storage recommends 1 hour or less for all SAS URLs.
  • Handle SAS Tokens Carefully: SAS URLs grant access to your data and should be treated as an application secret. Only expose SAS URLs to clients who need access to a storage account.
  • Have a Revocation Plan: Associate SAS tokens with a Stored Access Policy for fine-grained revocation of a SAS within a Container. Be ready to remove the Stored Access Policy or Rotate Storage Account Keys if a SAS or Shared Key is leaked.
  • Monitor and Audit Your Application: Track how requests to your storage account are authorized by enabling Azure Monitor and Azure Storage Logs. Use a SAS Expiration Policy to detect clients using long-lived SAS URLs.