I started looking into cloud-based storage around about a year and a half ago with a friend. I was instantly fascinated with the process behind setting up these storage containers, whether it be AWS S3 Buckets, Azure Blobs or Google Storage Buckets. The process simply involved a series of check boxes which controlled the configuration of the container. My friend and I noticed how easy it was to make a mistake during this process, especially if you were not aware of what each option meant and the consequences of a misconfiguration.
I had the initial idea that I would develop a tool that would allow me to scan a large number of S3 buckets quickly and look for such a misconfiguration. A very simple tool was written in python which allowed me to subdomain enumerate *.s3.amazonaws.com and iterate through enumerated domains to look for enabled directory listing, which is a great indicator of misconfigured S3 buckets. You can see the difference between a correctly and incorrectly configured S3 bucket below.
The tool was very simple to build based upon the fact that distinguishing between correctly and incorrectly configured buckets was easy to implement. Also due to directory listing being enabled on the majority of these misconfigured buckets it was easy to parse for any file extensions and keywords that may be of interest to me. Common extensions and keywords that we implemented into the scanning functionality were: sql, sql.gz, backup.zip, backup.gz, backup.tar, backup.tar.gz and many many more.
The second image above shows what a real-life positive hit would look like for a misconfigured bucket containing a sensitive file of interest matching our keywords/extensions. You can see the presence of an sql.gz file, which in this case was a back-up to a SQL database. Inside these files there are often sensitive information range from account data, passwords and PII.
Once I had finished developing the tool and analysed the data from the first few scans I decided to conduct more research to try and broaden our attack surface and find more buckets to scan. Upon doing so I stumbled upon https://buckets.grayhatwarfare.com/ which had essentially done my job for me. This site is nothing short of incredible for this area of research, it currently has 347683 S3 buckets scanned and indexed, alongside 24444 Azure Blobs also scanned and indexed. Grayhat is a user friendly web-application that allows you to search through the indexed data using both keywords and extensions, very much like my initial simple python script on steroids. I should add that Grayhat also comes with a very useful API!
Now the initial issue was solved of attack surface I decided to fully utilize Grayhat and its API. I changed my approach to create a python tool that would allow me to use the API to make a large number of requests and ex-filtrate buckets/blobs which contained interesting files. I would also check if any of the buckets I was reporting were writable using the command: aws s3 cp proof.txt s3://[BUCKET_NAME] — no-sign-request. This would often further increase the impact as you could host malicious files on the companies storage, or even just run them up a pricey AWS bill. This is definitely one to try on any buckets you find that have bounty programs!
In doing so I had tremendous luck, which led to two critical bounties on Bugcrowd (see below). I was not only rewarded for these two that existed on bounty programs, but I have been rewarded with cash payments by around 15 companies privately that were very appreciative of my responsible disclosure. I am still in touch with a small number of these companies and conduct routine tests on new features they are adding to their platforms to ensure they are secure. This research has been a great method for me to build relationships with companies and develop my experience. I would highly recommended it to anyone who is interested!
In conclusion, cloud based storage is great but it is very easy to make catastrophic mistakes. If you are setting them up then please ensure you test the access control yourself before uploading any sensitive files. I would also advise anyone who is keen to start on this research to use Grayhat as it is an incredible asset. There are still many, many public buckets out there with SQL database back-ups sitting on them so do your part and help secure these companies by disclosing your findings responsibly.
I am also happy to share any tools I have developed, so just reach out to me if they will be of interest to you!