FAQ

General

What is this?
Codeium is the modern coding superpower, a code acceleration toolkit built on cutting edge AI technology. For now, we are releasing a free code generation tool, but want to hear from you to shape the product roadmap (see questions under Next Steps).

We believe there are too many parts of the modern coding workflow that are boring, tedious, or downright frustrating, from regurgitating boilerplate to poring through StackOverflow. We can use recent advances in AI to eliminate these parts, making it seamless to turn your ideas into code. With easy integration into editors, you can focus on being the best software developer, not the best code monkey. Read more about why we decided to build this and our philosophy on building this in this blog post.
How does this work?
At the core, we're using a large generative machine learning model that is capable of understanding the context of your code and comments in order to generate suggestions on what you might want to type next. We have coupled this with state of the art ML serving infrastructure to create a highly performant and scalable product. Codeium is not always right (there's always room for improvement!), but you will feel like you have superpowers with even just 10% of your work being assisted by Codeium.
Who should use this?
Codeium does not replace a software engineer, and the developer is still in charge and responsible for the code written. Codeium does not test the code automatically, so a developer should carefully test and review all code suggested by Codeium. So while anyone can use Codeium, we recommend it especially for people who already have fundamental knowledge of software engineering and coding. It's never great to be dependent on anything, even a superpower.
How do I use this?
Follow our installation guides for Visual Studio Code, Jetbrains, Jupyter Notebook, Google Colab, and Vim / Neovim to get started today! The installation guides also have instructions on how to view, accept and reject suggestions. We hope to compile best practices given feedback soon!
Why am I getting bad results?
Like any other superpower, Codeium is more effective in certain situations than others. Codeium only has limited context to generate suggestions, doesn't have enough training data for new or esoteric capabilities of every coding language/framework, and anecdotally performs better on certain classes of prompts.

But also just like any other superpower, one can learn how to wield Codeium more effectively. We hope to compile best practices given feedback, but play around with how you write comments or function/argument names to see what causes Codeium to give the best results!
What programming languages does this support?
Empirically, we believe Codeium's performance is good for the following languages, and have enabled Codeium by default on any files that use them (alphabetical order): C, C++, C#, CoffeeScript, CSS, CUDA, Go, HCL, HTML, Java, JavaScript, JSON, Less, Objective-C, pbtxt, PHP, Protobuf, Python, Ruby, Rust, Sass, SCSS, shell, SQL, Starlark, Typescript, TSX, Vue, YAML.

The following languages we are not currently confident enough in the quality of suggestions to enable Codeium by default, but believe that with some upcoming model improvements (early February 2023), the performance will be substantially better and will be enabled by default (alphabetical order): Assembly, Clojure, CMake, Dart, Delphi, Dockerfile, Elixir, F#, Groovy, Haskell, Julia, Kotlin, LISP, Lua, Makefile, MATLAB, Perl, PowerShell, R, Scala, Solidity, Swift, VBA.
I like to code. Why would I like this?
We like to code too! But when we thought about it more, there were a lot of parts of coding that we didn't actually like that much or were a bit of a bore (ex. regurgitating boilerplate, poring through StackOverflow, trying to match an implicit style guide, etc). Everyone has had frustrating experiences just trying to turn ideas into code, and those are what we want to address.

Looking back at history, coding used to only be able to be done in the terminal, but today the vast majority of software developers use tools like modern code editors and Intellisense to make their coding experience better. Those tools didn't replace coding or people needing to know "how to code." Rather, they accelerated software development. We just think there is more juice left to be squeezed.
Will this always be free?
If you are reading this, then yes. Our philosophy is that if you are helping us from early on to achieve long term success by using Codeium and giving us feedback, then we should provide you with some long term benefit, the clearest being cost. For you, this code generation tool will be free now, free forever. That being said, if you are curious how we plan to eventually monetize, check out the relevant question in the Long Term Vision section.
Why the name Codeium?
We like to compare Codeium to fictional materials like vibranium or adamantium, materials that could be manipulated in multiple ways and could be used by anyone to gain powers that amplified their intrinsic abilities. Codeium also sounds like an element, and we like to think that it will remove all of the annoying or boring parts of the modern coding process, resulting in a purer form of software engineering.

Privacy & Ethics

Will Codeium regurgitate private code?
Not private code. Codeium's underlying model was trained on publicly available natural language and source code data, including code in public repositories. Similar to other such models, the vast majority of the suggested code has never been seen before, as the suggestions largely match the style and naming conventions in your code. Research has shown that the cases where there may be exact matching are often when there are near-universal implementations or where there is not enough context to derive these stylistic effects from.
Is there potential for bias, profanity, etc?
As with any other ML model, results from Codeium reflect the data used for training. The data used for training is primarily in English and does not have a uniform distribution of programming languages, so users may see degraded performance in certain natural and programming languages. In addition, there may have been offensive language, insecure coding patterns, or personally identifiable information in the publicly available training data. While we have anecdotal evidence that this information, especially personal data, is not produced verbatim, we always warn users to (a) not try to explicitly misuse Codeium and (b) review and test all produced code as if it is your own.
What data does Codeium collect?
Please see our Privacy and Security page , as well as our Privacy Policy and Terms of Service . The code generated by Codeium belongs to you, so you assume both the responsibility and the ownership. In order to continuously improve, Codeium does collect telemetry data such as latency, engagement with features, and suggestions accepted and rejected. This data is only used for directly improving the functionality, usability, and quality of Codeium, detecting abuse of the system, and evaluating Codeium's impact. Your data is not shared with, sold to, or used by any other party, company, or product, and we protect your data by encrypting the transmitted data end-to-end. This data is primarily used or inspected in aggregate, and can only be directly accessed in extreme cases by authorized members of the Codeium team. We want Codeium to be a product you can trust, and so any data collected will only be used to further increase Codeium's value to you. Codeium also does provide users with the option to opt out from allowing Codeium to store (and therefore use) their code snippet data post-inference.
Does Codeium use / emit open source data? How is attribution done?
We train only on permissively licensed code, which includes open source. We deeply respect open source, and the work done by these communities have undoubtedly been instrumental to making the software industry what it is today. That being said, open source code is often the highest quality permissively licensed code, and just as anyone who wants to be good at something needs to learn from the best, AI models need to learn from the best to be any good. Open source code is publicly available, and it is no secret that people often rip code off of open source (often copy-pasting large sections at a time) without any permission from the original authors/community or any attribution. We completely understand the concern that the AI will just abstract this out by vomitting out large blocks of open source material without any attribution. There are two required halves for a solution to this problem: (1) how do you minimize the chances that such events happen and (2) how to detect such events and properly handle them.

For the first, these AI codegen models have shown repeatedly in research that the vast majority of time, they produce entirely novel code. This is why these models are so exciting - the intent is not to create a new way to rip off code, and the research backs that up. That being said, at Codeium we have taken this one step further by explicitly capping generation lengths. So, instead of producing one gigantic block of code on a single inference, we opt towards building up that code in chunks, showing multiple options and waiting for user selections / modifications at each chunk before continuing the next chunk (if at all!). This injects user decision making into the code generation, adding another layer of protection against the rare cases where the model would have produced a long block of code verbatim. While of course these are not perfect, using AI codegen and capping generation lengths is a huge step past today's much-too-common flow of people copying open source verbatim.

For the second, detection and attribution is huge. With AI codegen, we actually have an opportunity to try to improve on the status quo and automatically assign and embed credit where it is deserved. Attribution is not an easy thing to do with machine learning models, but we are thinking of ways of how to do it properly with Codeium, and are investing time and effort in developing this aspect of the product. If you have any thoughts on how to define proper attribution or what a good attribution user experience would look like, we would love to hear it!

Next Steps

How can I give feedback?
We believe that this next big step in coding experience will only come with widespread developer testing and feedback. The easiest way to give suggestions is to join us on Discord and start a conversation in #feedback! Or, if you don't have any specific ideas but want to rate your experience with Codeium, take the 1 minute feedback survey.
Will there be code editors other than VSCode and Jetbrains?
Yes! Let us know your code editor of preference in the 1 minute feedback survey so we know which ones to prioritize.

Long Term Vision

How do you plan to make money?
Fair question whenever anything is billed as free forever, especially when there are clear infrastructure serving costs as we do! To be clear, this free forever deal is in order to provide some special benefit to early adopters like yourself who have supported Codeium from the beginning. We are actively working on a bunch of exciting additional features that we may consider as part of a paid Pro subscription, but we are committed to offering state-of-the-art code completion forever for free to you and as cheaply as possible to every developer at large.
Where is this heading?
We have a pretty grand vision for how we think the coding process can evolve, which is why we refer to Codeium as a code acceleration tool rather than purely a code generation tool (of course, there's a lot more that can be done for code generation too!). We want to hear from you on what parts of your day-to-day coding workflow are boring, draining, or downright infuriating. We want to optimize for making the most developers the most happy - tell us more about your current experiences.
How is this different from Copilot, Codegen, etc?
We tried them all! In its current form, Codeium gives you similar functionality and quality, except it is cheaper than Copilot (free!) and more usable than a self-hosted Codegen model (reasonable latency of suggestions and tricks to get better, more performant results). We want to keep increasing the quality and usability of the code completion, and explore how we can add more functionality to create a one-stop-shop for code acceleration. We believe our philosophy - (a) pairing state-of-the-art ML with world class ML infrastructure in a vertically integrated manner and (b) heavily relying on developer feedback to shape the product roadmap - is quite different from existing approaches, and will lead to a more usable, functional, and high-quality product.
Are you trying to build the singularity?
But wait, how do we know the singularity hasn't already happened? But on a serious note, no - we've seen how code has made the jobs of people in other industries less frustrating, and we just think it is the right time with the right set of technological breakthroughs to do the same for us developers as well. You're still in control, as it should be.