Artificial intelligence (AI) tools have been making waves in the tech world since the launch of ChatGPT in November 2022. These tools vary greatly in form and function, but one constant among them is that they aim to improve their users’ workflows and efficiency.
However, making effective use of these tools can be challenging without understanding how they work and how to best interact with them. Most of these tools – especially the ones based on OpenAI’s Generative Pretrained Transformer (GPT) models. These are large-language models (LLMs), which essentially work by taking an input prompt and predicting which text is most likely to follow that prompt based on the data it has been trained on.
OpenAI’s models have been trained on a tremendous amount of data, including software engineering, coding, and system design information. As such, the AIs built using these models can answer your questions in these fields and many others.
AI tools like GitHub Copilot, based on the OpenAI Codex model, and ChatGPT are widely used by developers to help them write code and solve technical problems. However, these tools have limitations when it comes to dealing with larger software design challenges due to context limitations.
This is where bleeding-edge tools like smol developer come into play. Smol developer is a tool built on GPT-3.5 and GPT-4 that aims to bring AI-enhanced workflows to higher-level software design, allowing you to generate an entire codebase from a given prompt.
Is this the silver bullet developers have been waiting for? In this article, you’ll see for yourself. You’ll be taught how to use smol developer to iteratively create a specification for a simple RESTful CRUD API that the AI will use to generate the codebase. You’ll be able to see the strengths and limitations of this approach and learn about any pitfalls that you should be aware of before incorporating this kind of AI into your workflows.
How to use AI for API design
There are a few things to note before you get started with this tutorial.
You can find the source code for this project in this GitHub repo. If you go through the README on the repository, you’ll see that you can run it two ways. The default way uses Modal for on-demand compute resources. However, this tutorial will use the non-Modal version, where you run the Python scripts locally on your machine. This means that the only dependencies you need to follow along are the following:
– Git
– Python
– pip (https://pypi.org/project/pip/)
– An OpenAI account and API key
– A code editor (VS Code is a good choice if you don’t have a preference).
You also need to ensure your OpenAI account has some credits or a payment method is configured. Each time you invoke the AI, it will be billed to your OpenAI account. The cost is fairly minimal per run. For instance, all the runs needed to write and debug this article cost around £1 (GBP).
Setup
With the prerequisites installed, you can set up the project. Run the following command in your terminal to clone smol developer onto your computer, and enter its directory:
bash
git clone https://github.com/smol-ai/developer
cd developer
Next, run the following command to install the Python dependencies using pip:
bash
pip install -r requirements.txt
Once this command has finished running, you need to export your OpenAI API key as a terminal environment variable so the script can access it. Do this by running the following command, with your API key substituted in:
bash export OPENAI_API_KEY=<YOUR API KEY>
*Note:* If you were using Modal, there’s an ENV file that would contain your API key, but the code is seemingly not set up to read from this file if you are running the tool locally. In this case, you need to export the API key manually.
Once your API key is exported, you can test if everything is working as expected by running the following command:
bash
python3 main_no_modal.py "a simple hello world application written in nodejs"
*Note:* Depending on your operating system and how Python is installed, your executable may be called either python3 or just python. The code blocks in this tutorial use python3, so you may need to adjust these depending on your setup.
Running this command invokes the smol developer tool, which takes your prompt and runs it through some preliminary steps to compose a more advanced prompt for the AI. This tool works by using your prompt to first generate a list of files that the AI believes would be needed to create the project.
Next, each one of these file names is given to the AI, along with the original prompt, at which point it is instructed to generate the supposed content of that given file. This process is repeated for all the files the AI suggested, ultimately resulting in a complete codebase if all goes well. From the earlier input, you should see some output like this:
In this case, because the prompt is so simple, you can see that the AI has decided to only generate a single file, index.js, and that the file’s content is a simple “Hello, World!” console log statement. However, this verifies that the tool is working and that everything is hooked up correctly and ready for more advanced use cases.
Define the API
Before asking the AI to build your application, you need a good idea of what you want. As you’ll soon see, this is very important when working with AI-powered tools. The AI can fill in any gaps in the specification, which can quickly lead to unexpected outputs.
Trying to fully specify everything in one go is also unwise. A more moderate approach involves iteratively building up your specification and running it through the AI after each addition and modification to see if your changes have the desired effects.
As a basis, you need to know what you’re building. In this case, you ask the AI to build a simple RESTful API written for Node.js. The API is a simple model of a ticket management system with a schema that looks something like this:
Build the API
To get started building the API, you need to create a new Markdown file in the repo to hold your specification. While you can provide your prompt directly in the command line, as with the previous example, the tool also accepts Markdown files, which are easier to work with as your specification grows in size and complexity.
Iteration 1
Create a file in the root of the repo named my-prompt.md and give it the following content:
md
"""
Create a RESTful CRUD API using express.js.
## Resources
The API has the following resources and fields:
### User
- id
- name
### Ticket
- id
- summary
- description
- author_id
### Comment
- id
- content
- author_id
- ticket_id
- parent_id
"""
*Note:* The `”””` on the first and last line are generally needed, as without them, the script tends to crash intermittently, depending on the output that the AI generates. Adding these quote marks seems to mitigate this issue.
This code is about as minimal as we can use as possible while covering the key details of what you want the AI to build. To have the AI execute this specification, run the following command:
bash
python3 main_no_modal.py my-prompt.md
This command prints a lot of output indicating what the AI is currently doing, but it follows the same general steps as the previous example. Initially, it decides what files and dependencies it needs, and then it creates the files one by one.
If you follow along, the code generated for you is very likely to be different from the code shown here. This is indicative of one of the issues with AI-powered tools like this. Specifically, they’re not deterministic, and you’re not guaranteed to get the same output each time for a given input. This will be expanded upon in the following section, but for now, just be aware that the code shown here will differ from what you generate, although the overall process of iterating should be quite similar.
Once the code is done generating, you can look inside the generated/ directory in the repo to see what the AI has created. In this case, it has generated these files:
You’ll see several files seemingly unrelated to the original prompt, such as an authentication and validation middleware and a database configuration file. For the index.js file, the following content was generated:
The AI has seemingly decided to use Mongoose to connect to an assumed MongoDB instance. Interestingly, there is no sign of the validation middleware from the file tree being used here. Looking at the validationMiddleware.js file, the following content was generated:
The AI has seemingly decided to use Mongoose to connect to an assumed MongoDB instance. Interestingly, there is no sign of the validation middleware from the file tree being used here. Looking at the validationMiddleware.js file, the following content was generated:
js
//middlewares/validationMiddleware.js
const { body, validationResult } = require('express-validator');
const validateUser = () => {
return [
body('name').notEmpty().withMessage('Name is required'),
];
};
const validateTicket = () => {
return [
body('summary').notEmpty().withMessage('Summary is required'),
body('description').notEmpty().withMessage('Description is required'),
body('author_id').notEmpty().withMessage('Author ID is required'),
];
};
const validateComment = () => {
return [
body('content').notEmpty().withMessage('Content is required'),
body('author_id').notEmpty().withMessage('Author ID is required'),
body('ticket_id').notEmpty().withMessage('Ticket ID is required'),
body('parent_id').optional(),
];
};
const validationMiddleware = (req, res, next) => {
const errors = validationResult(req);
if (errors.isEmpty()) {
return next();
}
const extractedErrors = [];
errors.array().map((err) => extractedErrors.push({ [err.param]: err.msg }));
return res.status(422).json({
errors: extractedErrors,
});
};
module.exports = {
validateUser,
validateTicket,
validateComment,
validationMiddleware,
};
This code looks fairly reasonable at a glance. However, searching for any of these functions in the generated code reveals that they’re not used anywhere. This indicates the second issue with this kind of tool: limited context. Because each file is generated individually, it’s not uncommon for the AI to generate functions that it never uses or to generate functions and then misuse them by providing incorrect or mismatched arguments. More on this in the next section.
For now, the best thing to do is to add more detail to the specification and try again.
Iteration 2
The main issues to address in this iteration are the removal of unwanted validation and database code. Do this by updating the specification as follows, with the “Adjustments” section being added to contain extra directives for the AI:
md
"""
Create a RESTful CRUD API using express.js.
## Resources
The API has the following resources and fields:
### User
- id
- name
### Ticket
- id
- summary
- description
- author_id
### Comment
- id
- content
- author_id
- ticket_id
- parent_id
## Adjustments
- Do not validate requests or responses
- Do not use a database, just use an array as a datastore for now
"""
After running this prompt through the AI, the following files are generated:
*Note:* Each time you run the tool, it removes all existing _files_ from the generated/ directory but not the directories themselves. Any empty directories you see are just remnants from previous runs.
This looks to be in line with the new spec. Again, for the index.js, the following code has been generated:
At a glance, this looks better, as it doesn’t contain unwanted middleware. However, it’s hard to tell without an editor to highlight it for you, but this file contains several unused imports, specifically the following:
– uuid
– userController
– ticketController
– commentController
– userModel
– ticketModel
– commentModel
Moreover, something seems off with the route declarations, as the functions they refer to are not inferring any function signatures when hovered over. The problem is apparent when you open one of the route files, such as routes/user.js:
The route files do not export the functions the index.js code is trying to use (although the controllers export functions that match those names). As it is, this code would not run, as there is a mismatch in the patterns being used between the files. It should be possible to fix this by being more specific in the specification.
Iteration 3
To add more detail to the specification to get more consistent output, add the following sections to your specification after the ### Comment heading and before the ## Adjustments heading:
md
## Module structure
Each module (users, tickets, and comments) should use the following structure. "User" is used as an example, and the name should be changed appropriately for each given module.
- modules
- users
- userRoutes.js
- userController.js
- userService.js
### userRoutes.js
This file should be used in the `index.js` like so:
// index.js
app.use('/users', userRoutes);
The file itself should contain mappings between a given endpoint and the controller method that serves it, like this:
// modules/users/userRoutes.js
router.get('/', userController.getAllUsers);
router.get('/:id', userController.getUserById);
router.post('/', userController.createUser);
router.put('/:id', userController.updateUser);
router.delete('/:id', userController.deleteUser);
### userController.js
This file should serve as an HTTP adapter, and should invoke business logic from the `userService`. This file should declare the following functions:
- getAllUsers
- getUserById
- createUser
- updateUser
- deleteUser
Each of these functions should invoke the function of the same name from the corresponding service.
### userService.js
This file should house the business logic for the module. This file will declare the following functions:
- getAllUsers
- getUserById
- createUser
- updateUser
- deleteUser
With this added detail about how modules should be structured, run the AI again. This time, the following files are generated:
This matches what was specified in the prompt, which is a good start. Checking index.js, it looks like the desired changes have been implemented:
Unfortunately, there are two major issues with this iteration. The _Ticket_ module seems mostly correct, having followed the specification for which functions should be defined in each of the files. However, this is not the case for the _User_ and _Comment_ modules, which invoke the _Express router_ in both their _routes_ and _controllers_, like so:
This means that these modules do not work and crash the application when they’re run. The peculiar part is that the _Tickets_ module is structured correctly according to the specification. This might be related to the AI’s context limit, but it’s impossible to know for certain.
Further steps
This process could go on for a long time before resulting in something that is completely correct and able to run. During testing, even the most promising generated outputs still had a few issues that needed to be fixed manually.
If you’ve been following along, try refining the prompt further to see if you can get an output that is more like what you would expect. However, be warned that this could be very quick, slow, or anywhere in between.
The next section will examine some of the learnings from this process to determine how useful AI-powered tools like this are at this early stage.
Findings and observations
Smol developer is an impressive tool beyond what other AI coding tools currently offer. Most other tools, such as Copilot, are limited to generating single lines or blocks of code rather than entire codebases. However, since it’s still fairly young, some issues need to be addressed when viewing the tool through the lens of productivity and effectiveness.
Non-deterministic
One of the biggest challenges with using LLM-based AI tools is the lack of strict determinism. This is typically not an issue for tools like Copilot, where you generate small pieces of code at a time, and it’s the developer’s responsibility to incorporate them responsibly.
However, when generating an entire codebase, it’s more of an issue. Being non-deterministic means that the code that gets generated might never be generated in the same way again, which is particularly challenging when you need to repeatedly change your specification and regenerate the code.
It also brings into question the value and role of the specification. As you’ve seen, AI can still make mistakes, even concerning things explicitly mentioned in the specification. Couple this with the fact that the output generated from a given specification is liable to change, and the specification is treated less like a source of truth and more like a suggestion.
Lack of context
AI tooling, in general, tends to have problems with context. The tools are unaware of the business domain or most of the surrounding code. Because of this, tools like Copilot often make suggestions that don’t make sense, and you have to be mindful of which suggestions you accept.
In comparison, smol developer is creating the codebase from scratch, and you’d be forgiven for assuming that it has access to the entire context because of this. However, smol developers still generate files one at a time, and as such, it’s prone to making similar mistakes. For example, even in iteration 3, the function signatures defined on the service functions did not match where the functions were called:
This type of issue can result in the code needing a lot of manual rework if you can’t iron it out in the specification.
Smol developers can run using either GPT-3.5-turbo or GPT-4. Unfortunately, GPT-4 is still in closed beta at the time of writing, while GPT-3.5-turbo is available. GPT-3.5-turbo generally has a harder time keeping context compared to GPT-4. The author of smol developer notes that performance in this regard is better when using GPT-4, so if you have access to it, you should use it instead.
Consistency
Similar to the issue with context, consistency issues are not uncommon. Specifically, in iteration 3, despite the specification outlining the structure of each module, the `_tickets_` module was generated as specified, while the other two were not. This makes the issue particularly difficult to debug. It isn’t simply a case of inadequate specification, as one of the modules was generated as specified. This problem with repeatable consistency is likely related to the next point.
Standardisation
Unlike traditional coding, which relies on strictly defined standards and syntax, LLM AIs accept unconstrained textual input. There is no standardised description language that you can use to convey your absolute intent to the AI. The output you get depends entirely on the AI’s nondeterministic interpretation of your words. This makes it a difficult foundation to build an application on, and the quality of the code you get depends somewhat on your specification and equally as much on luck.
Wrap up
This article shows how LLM AI tools can be leveraged to help with the API design and development process. Here, you’ve seen how bleeding-edge tools like smol developer can rapidly create large amounts of code based on your provided specifications, and you’ve seen how you can iteratively tweak these specifications to move the output toward your desired outcome. AI tools have already found a firm foothold in the toolsets of many developers and aren’t likely to go away anytime soon.
However, tools that promise to create entire codebases for you still have a way to go before they’re suitable for production use. This current generation of AI developer tools lacks some key foun