My Journey into AI Engineering: Lessons Learned and Advice for Beginners

The Challenge of Learning

When I started building in AI, LlamaIndex had recently started their Discord server and I was trying to figure out how to build GenAI apps. At the time, I was trying to build a product called interviewify, which was basically a Zoom note-taker tailored towards user interviews.

I remember spending countless hours reading up on vector embeddings, vector store indices, knowledge bases— what was a generic concept, what was from LlamaIndex, how are vector indices different from vector stores? All of these questions led me down what felt like an infinite loop of understanding & misunderstanding. Much of this questioning took time away from actually building & iterating which is a critical way of learning especially when it comes to greenfield areas like GenAI.

My friend Adam Towers texted me this and it really resonated with me:

While you’re actively innovating and improving, you want to learn as fast as possible.

This quote perfectly encapsulated my experience. I realized that to truly learn and innovate in AI, I needed to focus on rapid iteration and hands-on experience rather than getting bogged down in theoretical concepts.

The Power of Deadlines

The moment where I really started learning a lot was when I gave myself a deadline. The incubator I was in had a demo day where we would present to a few hundred people at DubHacks Next Demo Day. and we needed to have everything ready to go by March. I remember feeling nervous because up until that point I had only built out tiny parts of the AI system and had tinkered with different prototypes but a lot of it would work one day then break the next. I learned that it’s important to version your requirements.txt especially given how fast AI-related packages like LlamaIndex and Langchain move and introduce breaking changes. It was like one day I had a working app and the next day I ran a pip install which broke my entire code, putting me back at square one.

Building My Own RAG Pipeline

I reached a point of frustration where I thought to myself, “you know what, let me just create my own version of a RAG (Retrieval Augmented Generation) orchestrator”. This ended up being a really important decision because this is a moment that brought a lot of clarity.

It kind of reminded me of learning a language like Javascript or Python and creating “pointers” in a binary tree before learning how to declare memory using a language like C++ where you’re actually manipulating the memory and have to deeply understand what’s going on. Creating my own RAG pipeline actually simplified a lot of concepts for me because I had to understand what was going on at every step. I realized that it wasn’t as complex as I was making it out to be.

After I really understood how RAG worked, I started experimenting with different concepts like clustering vector embeddings using T-SNE (t-Distributed Stochastic Neighbor Embedding) and UMAP (Uniform Manifold Approximation and Projection), chunking methods, injecting metadata, etc. These techniques helped me visualize and organize high-dimensional data in more manageable ways.

Key Lessons Learned

Something I underestimated early on was prompt engineering. It’s easy to come up with a prompt and think that it’s robust— once you test with enough queries though you’ll see that there will almost always be edge cases where you’ll need to iterate. Learning this skill was really important for me because it allowed me to iterate towards getting consistent output out of LLMs.

I think the biggest mistake I’ve made when it comes to building with LLMs is thinking more than doing. The advice for myself is to have a strong bias for action. If you don’t understand something, just go do it and figure it out. That is how I learned most of the things I am working on in GenAI so far.

Advice for Beginners

My suggestion would be to:

Understand what RAG is
Implement RAG just using a database, an LLM + embedding model provider’s API, your own code
See how that relates to Langchain and LlamaIndex (allows you to understand these packages much better)

Once you establish a baseline it’ll be easier to iterate on top of this foundational knowledge.

Conclusion

As I reflect on my journey in AI engineering, from the early days of confusion to now, I’m struck by how much can be learned through hands-on experience. The key takeaways - understanding fundamental concepts, learning by doing, and continuous iteration - have been crucial to my growth in this field.

To those just starting out: don’t be discouraged by the initial complexity. The AI landscape may seem overwhelming at first, but with persistence and practical application, it becomes clearer and more manageable.

Helpful Resources

Here are some resources that you might find helpful:

Chip Huyen’s Blog - Offers insights on AI engineering best practices.
Eugene Yan’s LLM Patterns - Provides practical patterns for working with Large Language Models.
Pinecone’s Vector Database Guide - Explains the concept and applications of vector databases in AI.
Jason Liu’s Blog - Shares technical insights in AI development.

Cheers,

Paulo