An Audioguide about the Geology of Ecuador

Hello again. I've stayed a little AFK lately due to a big variety of reasons, such as my admission to college, a couple of life events, and some time away from home. But now I'm back.

Today I'll write about a project I worked on during the first week of august.

As you may have realized throughout this entire blog, all projects I write about are 100% mine; however, this project was not for me, but for my sister. She studies geology and had an assignment with creative freedom, so, to finish her project on time, I put Rubikraze temporarily aside (my current personal project).

Like I said, my sister could choose any format for her assignment; the topic was geological formations of Ecuador. So, she asked me if I could make an HTML file with a map of Ecuador, and a list of audios (recorded by her of course) talking about different geological formations.
As the good brother I am I said yes; the file in itself was not that complex or sophisticated, it was just an image of a map where you could play an audio by pressing it's number on the keyboard. By the nature of the assignment, she had to deliver something new to the project twice a month; so every 2 weeks, she would record a couple more audios, I would add them to the HTML file, and she would send that as her homework. Nothing crazy.

However, for the assignment final's delivery, she wanted to do something more relevant. My sister, wanting to join the AI train, proposed making an AI chatbot that would answer questions about specific data from geological formations in Ecuador. So of course OpenAI's GPT API was perfect for this because we didn't have a couple million dollars to spare in hardware and specialized workforce to train our own LLM.

Using OpenAI's API meant that we couldn't do everything in the client side anymore, because we would have to deal with keeping the API key secret. So, the solution was obvious: just make it a website.
I've been programming for a long time, but never really had the necessity nor means to pay for a website; so deploying one was something I had never done before. This was not a terribly complex project in comparison to some others I've worked on which included a lot of problem solving, like Convertool, BitNGo, or Rubikraze; but definitely was new terrain anyways, so it was a great learning experience.

Attempts at Self-Hosting

A while ago I bought a used laptop in (very) poor condition, I took away useless damaged parts (such as keyboard, speakers, battery), and installed Linux Mint in it; it was useful for learning Linux and now I use it as a homelab.

My first idea for my sister's project was to use said Linux machine as a web server for the website. I got as far as setting up Nginx and configuring my router to forward requests through port 443 to it, but some limitations from my ISP wouldn't let me connect to it outside of my home network. So, sadly, I had to scrape the idea of self hosting the website myself.

Domain Acquisition

The project wasn't meant to be live for more than one month, and it was a college task, so I didn't exactly need to protect a brand and acquire an expensive .com domain name. So I took my time to compare different providers and domain extensions, and settled for the domain geoecuador.site, which only costed me $0.99 + an ICANN fee of $0.18, for a grand total of $1.17.

Web Hosting Acquisition

Web hosting is probably one of the most competitive markets at the moment, so I had a lot of options to look for a service that offered SSL certificates and one month service (because the market is flooded with providers that sell years, not months, of hosting). Ultimately I came across a really good deal of one month of web hosting for $6 with 90% discount for new clients. So yeah, I definitely purchased it (their customer service was great too) and secured reliable web hosting for the website for $0.60; I plan on migrating this blog from blogpost to my personal website, along with some projects to show off as a portfolio, so I also will definitely consider their web hosting service in the future.

Developing the Frontend

Due to the simple nature of the frontend code's previous versions, I didn't take that much time writing the best code possible, so some rewriting was necessary. 

This is perhaps the worst case of what I'm talking about:
I was adding audios every 2 weeks, so no need to fancy elegant code. Over time, it started looking like this:


This is abhorrently ugly, of course, and written because at the time, there wasn't really a necessity of writing something better. But now was about time to write something better. I came up with this:


Which was considerably better and worked pretty good for some time (although it wasn't completely bug free).
While working on other aspects of development, I talked to my friend Marco about the project, and he was curious to see; Marco's specialty is frontend development, and after seeing a couple of bugs in some parts of the webpage's functionality, he offered to help with important parts of the frontend. That specific code being one part. He came up with this code (which was the one used in the live final version):


Teamwork with Git

I only had one week to complete everything my sister asked for, so originally, I wasn't planning on upgrading the audio system a lot more. But now, with Marco's help, I could completely focus on the chatbot's backend and frontend functionality while he improved the buggy or ugly parts of the audio list. 

Before actually start coding, we had a brief call about the webpage's design, we used Microsoft Paint to draw some ideas and concepts.
He would work on some under-the-hood optimization (like the code I showed before) and the implementation of a progress bar for audios, along with a pause button; an interface similar to that of Spotify or Windows Media Player.
On the other hand, I would focus on designing and implementing the AI's chatbox inside the website; and on coding and deploying the backend part, to deal with API calls and connecting frontend to backend.

Of course, we used Git and GitHub, each with his own branch.

Marco's Contributions

Apart from optimizing a big part of the JavaScript code, Marco implemented a progress bar for each audio using Vanilla JavaScript and CSS to keep development simple. We took inspiration from audio controls like Windows Media Player; the end result was this:




The blue color was taken from the logo that my sister initially made in order to maintain a consistent visual interface, which worked pretty well.

Marco's contributions were very important, he also made sure to keep the audio system bug-free. Previously, for instance, under certain conditions, 2 or more audios could play simultaneously. With Marco's new code, these kind of bugs didn't happen.

Link to the repo in case you want to check out Marco's contributions more closely.

Chatbot's frontend

I kept the frontend part simple. A bubble with a robot logo that opened a small chatbox window. Again, I used the site's blue color and a lighter tone for user message boxes. For the font family, I used the same one my sister used to design the audio boxes: KG Primary Penmanship.
My sister also wanted to name the AI assistant GeoBot, so the chatbox included its name on top.
The chatbox also expanded as more questions were asked and answered, optimizing screen space.
This was the end result (note that this screenshot was made with an already working backend, obviously):


Backend and API request

Writing the frontend was just half the work. To have a working assistant, it had to make requests to OpenAI's API from the backend with the proper "prompt engineering"--if that's even a real thing.
So, I wrote a basic Axios and ExpressJS setup, and now it was time to really get into communicating with the LLM.

By the way, API keys and secrets should always go as environment secrets in .env (or Codespaces Secrets if you're coding in the cloud), and ignored for staging by the .gitignore file to avoid it being used by someone else. This is something pretty obvious for most, but you'd be surprised by the amount of people that commit API keys to their public GitHub repo.

My sister's main idea for the chatbot was to enable users to ask questions about different geological formations in Ecuador and receive answers based on a dataset of those formations. For that, she sent me an XLSX file with all the data that the AI needed; and I converted it to CSV to make it easier for the AI to read.

So, there is something called fine-tuning in the OpenAI environment, and apparently it is useful for adjusting a pretrained model to a specific dataset to improve performance on specific tasks. This definitely sounds like the perfect way to set up GeoBot, however, fine-tuning turned out to be something way too complex for me; also, with how little money I had spent, I could only fine-tune the older gpt-3.5 versions instead of the newer, faster, and cheaper gpt-4o versions.
Besides, to fine-tune a model, you need to have an extensive dataset with examples of questions and answers (more than 500 to have better results) in a JSON format, but even then, your fine-tuning job could fail under some circumstances. Not to mention, there is not a whole lot of resources on the internet about fine-tuning for absolute beginners like myself, so that was another disadvantage.

Complexity, older versions, way too much work, and lack of expertise on my part were the reasons I decided not to use fine-tuning for this project. It just wasn't that necessary, so simpler and faster alternatives were best suited for my specific case. That's why I opted to configure everything directly within the API request's prompt.

OpenAI's API is paid, you can either choose to prepay credits for later consumption; or have a billing plan where you use the API normally, and at the end of the billing term they charge you what you've used. I saw the prepaid credits option more fitting for this project, so I bought $5 in credits (the minimum allowed), so I didn't have to worry about payments later.
When making requests to the API, you have 4 important parameters: version, system, user, and temperature.
The version refers to the GPT version you want to make the request to (GPT-3.5, GPT-4, GPT-4o, GPT-4o-mini, etc.), different versions have different costs, more on that later.
The system works as a context provider, defining the AI's role, tasks, and limitations. The AI is not able to ignore or disobey these instructions.
The user is the question or message that the normal user asks the AI; for example, the messages you send when you talk to ChatGPT are  most likely processed as user.
The temperature is a float that defines the AI's output randomness, 0 being the least creative and 2 being the most unpredictable.

Knowing how the API parameters worked, I could start designing the way the backend worked. Every time someone sent a message to GeoBot, it would take that and include it inside the API request as the "user" content. But the system content was the important part, since that was what defined the AI's responses and limitations; and let's not forget that I also had to include the whole CSV dataset of geological formations in the system content, so the AI's response could be based off of said dataset.
Finally, the temperature could be just set to 0.7, a standard balance to have an eloquent yet precise answer.

After some rounds of trial and error, this is the API request I found best fitting (the original was in Spanish, this is a translated version):

Version: gpt-4o (I'll explain the reason behind this choice later).
System: Your name is GeoBot, you are a geology expert with a list of geological formations from Ecuador. For any questions that are not closely related to geology, tell the user that you cannot help with that, as your area of expertise is geology. For questions about formations in Ecuador, answer in a single sentence, and keep it brief. Do not use Markdown language, only natural language. If asked about a formation whose name appears in two or more formations, provide information about both. If the user doesn’t ask any questions, tell them your name is GeoBot and that you can help with questions like "Tell me about the Yunguilla formation". Base it on the following CSV table: [CSV table of formations].
User: [message sent to the chatbox]
Temperature: 0.7

As I said earlier, the API has a cost, this cost is defined by the tokens you use in each request. Tokens are the unit of measurement for the input and output size of an LLM request; the bigger the content you send and receive, the more tokens you use. Tokens are also what you are charged for when making API requests.

Because of the size of the API request I used, I consumed a lot of tokens in every single message, around 5000 (around 5 times more than what would be considered normal for this kind of applications). This wasn't optimal, but it was the quick solution, and given the extremely tight deadline for the task, the quick solution was the only solution.

Choosing the right GPT version

Different GPT versions have different costs. With the $5 in prepaid credit I bought, I could use any model I wanted by specifying it in the "version" parameter of the API request.

I came up with a simple formula to calculate how much requests you're going to be able to make, taking into account price per million tokens, prepaid credit or budget, and token usage per request:

\[ \frac{\text{Budget}}{\text{Token Price}} \times \frac{1000000}{\text{Token Usage}} \]

We know that we consume 5000 tokens per request, and that we have a $5 budget limit.
The cheapest model by far is gpt-4o-mini, it costs $0.15 per million tokens. Plugging those values, we get:

\[ \frac{$5}{$0.15} \times \frac{1000000}{5000}\approx 6666 \]

So, with a $5 budget and a 5000 tokens per request of usage, we will get around 6666 requests of gpt-4o-mini.
But as you may have noticed, I didn't end up using gpt-4o-mini for this project, because on testing, it frequently hallucinated or provided incorrect answers, which is expected given its lower power compared to other models. Although it was highly cost-effective, gpt-4o-mini just wasn't a reliable model for this use case.

I conducted some testing with gpt-4o, and its responses were much better and precise. Gpt-4o was significantly more powerful, though also considerably more expensive, at $5 per million tokens; almost x34 more expensive than gpt-4o-mini. I calculated the number of requests I could make with gpt-4o: 

\[ \frac{$5}{$5} \times \frac{1000000}{5000}\approx 200 \]

Around 200 requests of gpt-4o, much fewer than with gpt-4o-mini; but it kind of was the only real option since wrong or inaccurate answers were just unacceptable.

Backend and Frontend were ready, I merged my backend development branch with Marco's frontend development branch, resolved a couple of merge conflicts here and there, and everything was in order.

GeoBot was ready with the backend API requests and frontend chatbox, now it was just a matter of polishing some details.

Final Details

I wanted to have the audios as easy to browse as possible. Marco planned on adding "previous" and "next" audio buttons in the control bar, but on the last day of development he got caught up with things outside out project and had to leave; but of course, his contributions were already more than enough.

I took care of the control bar's last features, also, I rewrote some parts of Marco's code that could be improved, like the play button's positioning. Here you can see the differences:

Before:

After:

Final Result

After deploying everything on the web server I acquired, configuring the domain provider with the DNS settings, and waiting for the DNS propagation, everything worked perfectly. This is a demonstration of the website in action:



Notice how in the AI demonstration, an audio started playing; that's because initially, the plan was to use the keyboard to play the audios. So, when I typed "3", the third audio started playing.
These recordings are old, the website stopped being available when the month of web hosting expired. You can visit the GitHub repo if you want to check it out though.

In the end, we spent a total of $7.77: $1.17 for the domain name, $0.60 for web hosting, and $5 for the minimum amount of OpenAI credit.

This was a great project to work on, and I learned a lot working in it. Special thanks to Marco for helping out with the frontend, and to my sister for trusting me with this project.

As always, thanks for reading.









Comments

Popular posts from this blog

Active Projects as of Today

Explaining Color-Driven Summation