quickBytes | DeepCode raises $4M to uncover bugs using models trained on millions of lines of open source code

> DeepCode raises $4M to uncover bugs using models trained on millions of lines of open source code

DeepCode, a Swiss startup building an intelligent code review tool, raised $4M to expand its machine learning models for automatic vulnerability detection. Unlike other tools that simply review imported packages and dependencies, DeepCode can detect more complex issues, such as cross-site scripting and SQL injection vulnerabilities, by understanding the intent behind the code——not just syntax mistakes. DeepCode currently supports Java, JavaScript, and Python, but hopes to expand to C#, PHP, and C/C++.

DeepCode is trained on thousands of open source projects, mostly public repositories on GitHub. By ingesting GitHub’s rich commit history, DeepCode understands changes in code so that it can infer where bugs may have been in the code and what changes were needed to fix them. According to CEO Boris Paskalev, “On average, developers waste about 30% of their time finding and fixing bugs, but DeepCode can save half of that time now, and more in the future.”

DeepCode and similar tools are starting to automate code creation, either through smarter code reviews or more accurate code completion. DeepCode, however, adds an extra level of complexity: whereas code completion tools like TabNine are trained by taking snapshots of code to provide suggestions, DeepCode compares snapshots across commits to build its models. By understanding how open source code changes over time, DeepCode can recommend similar changes to developers.

DeepCode also joins a growing number of startups that are working to leverage huge swathes of open, public coding data. As the host for much of the developer world’s open source repositories, GitHub holds a powerful position as the aggregator of terabytes of valuable coding data. GitHub is likely to develop its own capabilities or, as it has done with Pull Panda and Dependabot, acquire fledgling companies. Coding data is GitHub’s most valuable asset, one that it will likely try to take advantage of as it tries to turn GitHub into a hub for code automation and intelligence.