Lost millions due to interns? ByteDance officially reports!
2024.10.19
近日,有消息称字节跳动发生大模型训练被实习生“投毒”事件。据悉,该事件发生在字节跳动商业化团队,因实习生田某某对团队资源分配不满,利用HF(huggingface)的漏洞,通过共享模型注入破坏代码,导致团队模型训练成果受损。消息称此次遭到入侵的代码已注入8000多张卡,损失或达千万美元。
ByteDance insiders have revealed to reporters that the company did indeed experience an incident of sabotaging model training recently, but the rumors contain exaggerations and fabrications. The actual event occurred at the end of this year, with Tian某某 being an intern in the commercialization technology team. Due to dissatisfaction with the team's resource allocation, Tian used attack code to disrupt the team's model training tasks. The figure of tens of millions of dollars in losses has also been exaggerated.
Additionally, the aforementioned informed sources stated that the business affected by the code intrusion is not Doubao's large model, but the model training tasks of the commercialization technology team, impacting some technical work of the advertising department. The part where interns intruded through shared models does not belong to the group's large model.
In response to this incident, ByteDance officially issued a statement on the afternoon of the same day, stating that the intern in question has been dismissed, and the company has also reported the situation to the industry alliance and the intern's school.
This incident has exposed security management issues in ByteDance's technical training, including permission isolation and the auditing of shared code. An industry insider told reporters that permission isolation and auditing are beneficial for protecting the company's core data and intellectual property, preventing data leakage, and enhancing the security of data and systems. For example, real-time monitoring of permission usage can help promptly identify misuse and abnormal operations; regular audits of permissions can check whether team members' permissions align with authorization policies and whether there is any misuse of permissions. However, this also presents challenges, including the cost of cross-departmental cooperation and the resource investment required for routine maintenance and updates.
The latest ByteDance large model information disclosure occurred on a specific date this year. During the Video Cloud Technology Conference, Volcano Engine unveiled a large model training video preprocessing solution, aimed at addressing technical challenges in cost, quality, and performance for video large model training. Currently, this technology solution has been applied to the Doubao video generation model.