Private API keys and passwords found in AI training dataset – nearly 12,000 details leaked




  • Truffle Security found thousands of pieces of private info in Common Crawl
  • The archives are used to train some of the biggest LLMs today
  • The researchers notified the vendors and helped fix the problem

Cybersecurity researchers have found thousands of login credentials and other secrets in the Common Crawl dataset.

Common Crawl is a nonprofit organization that provides a freely accessible archive of web data, collected through large-scale web crawling. As of recent estimates, the organization hosts over 250 petabytes of web data, with monthly crawls adding several petabytes more.

https://cdn.mos.cms.futurecdn.net/dEpz5LV5PYpqYBngLd6omi-1200-80.jpg



Source link

Latest articles

spot_imgspot_img

Related articles

spot_imgspot_img