Meta has announced the development of CodeCompose, an AI-powered code generation tool similar to GitHub’s Copilot.
During an event centered on Meta’s AI infrastructure initiatives, including the creation of custom chips to expedite generative AI model training, the company revealed CodeCompose. Although not publicly available yet, Meta’s internal teams utilize the tool to receive code suggestions while coding in IDEs like VS Code, specifically for languages like Python.
CodeCompose is built upon Meta’s own research, fine-tuned to suit internal use cases and codebases. Michael Bolin, a software engineer at Meta, explained in a prerecorded video that the tool can be seamlessly integrated into any environment where developers or data scientists work with code.
Meta trained multiple CodeCompose models, with the largest containing 6.7 billion parameters, slightly less than Copilot’s model. Parameters represent the learned aspects of the model from historical training data and contribute to its problem-solving abilities, such as text generation.
To ensure optimal performance, CodeCompose was fine-tuned using Meta’s proprietary code, including internal libraries and frameworks written in the Hack programming language. This allows the tool to incorporate domain-specific knowledge into its code suggestions. Additionally, the training data set was carefully filtered to exclude poor coding practices and errors, minimizing the likelihood of the model recommending problematic code snippets.
In practice, CodeCompose provides suggestions such as annotations and import statements while the user is typing. The system can complete single or multiple lines of code and even fill in large chunks of code when needed.
Bolin highlighted that CodeCompose leverages the surrounding code and even takes into account code comments as signals for generating better suggestions.
Meta claims that thousands of employees are already benefiting from CodeCompose, with an acceptance rate exceeding 20%.
However, Meta did not address the controversies surrounding code-generating AI tools.
Microsoft, GitHub, and OpenAI are currently facing a class action lawsuit alleging copyright infringement by Copilot, as it regurgitates sections of licensed code without proper attribution. Aside from legal liability, there are concerns that AI tools like Copilot could inadvertently lead companies to incorporate copyrighted suggestions into their production software.
It remains uncertain whether CodeCompose was inadvertently trained on licensed or copyrighted code. When asked for clarification, a Meta spokesperson stated that CodeCompose was trained on InCoder, a release from Meta’s AI research division. The training involved a corpus of public code with permissive open-source licenses from GitHub and GitLab, as well as StackOverflow content. Additional training focused on Meta’s internal code.
Generative coding tools may also introduce security vulnerabilities. A recent study from Stanford revealed that software engineers who use code-generating AI systems are more likely to create security flaws in their applications. While the study did not specifically analyze CodeCompose, it is reasonable to assume that developers using the tool could face similar challenges.
Bolin emphasized that developers are not obliged to follow CodeCompose’s suggestions and that security was a key consideration in its development. He added that Meta is thrilled with the progress of CodeCompose and believes that their developers are best served by keeping this work in-house.