A blog dedicated to demystifying technology through powerful knowledge sharing.
Explore bite-sized insights, real-world examples, and key takeaways on essential tech concepts — made simple, practical, and impactful for curious minds and seasoned professionals alike.
Mobile application security is no longer optional—it’s essential. Whether you’re an Android, iOS, or Windows mobile developer, integrating automated security assessments into your CI/CD pipeline can drastically improve your app’s resilience against attacks. Enter MobSF (Mobile Security Framework)—an all-in-one toolkit for performing static and dynamic analysis of mobile apps.
In this guide, we’ll walk through setting up MobSF using Docker on macOS with Colima and demonstrate how to conduct both static and dynamic analysis.
๐ What is MobSF?
Mobile Security Framework (MobSF) is an open-source, automated mobile application pentesting, malware analysis, and security assessment tool. It supports:
Static & dynamic analysis
Mobile binaries: .apk, .ipa, .appx, .xapk
Source code (zipped)
REST APIs for CI/CD or DevSecOps integration
Whether you’re running tests during development or before release, MobSF provides valuable insights into the security posture of your app.
๐ Prerequisites
Before starting, ensure the following tools are installed on your macOS system:
Colima – Docker Desktop alternative for macOS
Docker
Installation Commands:
brew install colima
brew install docker
colima start
๐งช Running MobSF with Docker
Once Colima and Docker are set up, launch MobSF using the following command:
docker run -it --rm -p 8000:8000 opensecurity/mobile-security-framework-mobsf:latest
๐ Accessing the MobSF Dashboard
After launching MobSF, the terminal logs will include a line like:
Listening at: http://127.0.0.1:8000
Copy this URL into your browser to open the MobSF dashboard.
Uploading an App:
Simply drag and drop your APK/IPA file into the dashboard to begin static analysis.
๐งพ Static Analysis
Once uploaded, MobSF automatically scans the app and generates a security report. Monitor the Docker logs to verify successful completion or identify potential issues during analysis.
Sample Reports:
AppSec Scorecard for Prod Build (v2.9.8)
Full Static Analysis Report
✅ These reports help developers and security teams identify code-level vulnerabilities, permission misuses, and more.
๐ Dynamic Analysis
MobSF’s dynamic analysis enables runtime behavior assessment, allowing you to detect malicious operations or insecure runtime behaviors.
Learn how to run LLMs locally, explore top tools like Ollama & GPT4All, and integrate them with n8n for private, cost-effective AI workflows.
Have you ever worried about the costs of using ChatGPT for your projects? Or perhaps you work in a field with strict data governance rules, making it difficult to use cloud-based AI solutions?
If so, running Large Language Models (LLMs) locally could be the answer you've been looking for.
Local LLMs offer a cost-effective and secure alternative to cloud-based options. By running models on your own hardware, you can avoid the recurring costs of API calls and keep your sensitive data within your own infrastructure. This is particularly beneficial in industries like healthcare, finance, and legal, where data privacy is paramount.
Experimenting and tinkering with LLMs on your local machine can also be a fantastic learning opportunity, deepening your understanding of AI and its applications.
What is a local LLM?
A local LLM is simply a large language model that runs locally, on your computer, eliminating the need to send your data to a cloud provider. This means you can harness the power of an LLM while maintaining full control over your sensitive information, ensuring privacy and security.
By running an LLM locally, you have the freedom to experiment, customize, and fine-tune the model to your specific needs without external dependencies. You can choose from a wide range of open-source models, tailor them to your specific tasks, and even experiment with different configurations to optimize performance.
While there might be upfront costs for suitable hardware, you can avoid the recurring expenses associated with API calls, potentially leading to significant savings in the long run. This makes local LLMs a more cost-effective solution, especially for high-volume usage.
Can I run LLM locally?
So, you're probably wondering, "Can I actually run an LLM on my local workstation?". The good news is that you likely can do so if you have a relatively modern laptop or desktop! However, some hardware considerations can significantly impact the speed of prompt answering and overall performance.
Let’s look at 3 components you’ll need to experiment with local LLMs.
Hardware requirements
While not strictly necessary, having a PC or laptop with a dedicated graphics card is highly recommended. This will significantly improve the performance of LLMs, as they can leverage the GPU for faster computations. Without a dedicated GPU, LLMs might run quite slowly, making them impractical for real-world use.
The GPU's video RAM (vRAM) plays a pivotal role here: it determines the maximum size and complexity of the LLM that can be loaded and processed efficiently. More vRAM allows larger models to fit entirely on the GPU, leading to significantly faster speeds, as accessing model parameters from vRAM is orders of magnitude quicker than from standard system RAM.
LLMs can be quite resource-intensive, so it's essential to have enough RAM and storage space to accommodate them. The exact requirements will vary depending on the specific LLM you choose, but having at least 16GB of RAM and a decent amount of free disk space is a good starting point.
Software requirements
Besides the hardware, you also need the right software to effectively run and manage LLMs locally. This software generally falls into three categories:
Servers: these run and manage LLMs in the background, handling tasks like loading models, processing requests, and generating responses. They provide the essential infrastructure for your LLMs. Some examples are Ollama and Lalamafile.
User interfaces: these provide a visual way to interact with your LLMs. They allow you to input prompts, view generated text, and potentially customize the model's behavior. User interfaces make it easier to experiment with LLMs. Some examples are OpenWebUI and LobeChat.
Full-stack solutions: these are all-in-one tools that combine the server and the user interface components. They handle everything from model management to processing and provide a built-in visual interface for interacting with the LLMs. They are particularly suitable for users who prefer a simplified setup. Some examples are GPT4All and Jan.
Open source LLMs
Last, but not least, you need the LLMs themselves. These are the large language models that will process your prompts and generate text. There are many different LLMs available, each with its own strengths and weaknesses. Some are better at generating creative text formats, while others are suited for writing code.
Where can you download the LLMs from? One popular source for open-source LLMs is Hugging Face. They have a large repository of models that you can download and use for free.
Next, let's look at what are some of the most popular LLMs to get started with.
Which LLMs to run locally?
The landscape of LLMs you can run on your own hardware is rapidly expanding, with newer, more capable, or more specialized models being released every day!
Many powerful open-source models are available, catering to a wide range of tasks and computational resources. Let's explore some popular options, categorized by their general capabilities and specializations!
General-purpose model families
Several families of models have gained significant popularity in the open-source community due to their strong performance across various benchmarks and tasks.
Llama (Meta AI): The Llama series, particularly Llama 3 and its variants, are highly capable models known for their strong reasoning and general text generation abilities. They come in various sizes, making them adaptable to different hardware setups. The newest iteration, Llama 4, has been released, however, its size exceeds the capabilities of standard hardware for now.
Qwen (Alibaba Cloud): The Qwen family offers a range of models, including multilingual capabilities and versions optimized for coding. They are recognized for their performance, and tool calling abilities. Qwen 2.5 has extremely good performance, especially compared to its size. The recently launched Qwen 3 is even better across benchmarks!
DeepSeek: DeepSeek models, including the DeepSeek-R1 series, are often highlighted for their reasoning and coding proficiency. They provide strong open-source alternatives with competitive performance.
Phi (Microsoft): Microsoft's Phi models focus on achieving high performance with smaller parameter counts, making them excellent candidates for resource-constrained local setups while still offering surprising capabilities, particularly in reasoning and coding.
Gemma (Google): Gemma models represent a family of lightweight, state-of-the-art open models built from the same research and technology used to create Gemini models. They are designed to run on a single GPU making them ideal for local deployment! The latest iteration, Gemma 3, offers various sizes (e.g., 1B, 4B, 12B and 27B parameters) and is known for strong general performance, especially considering model size.
Mistral (Mistral AI): Mistral AI, a French company, offers a popular family of powerful and efficient open-source models (many under Apache 2.0 license), including the influential Mistral 7B and various Mixtral (Mixture of Experts) versions. These models are known for strong performance in reasoning and coding, come in diverse sizes suitable for local setups, and are praised for their efficiency.
Granite (IBM): IBM's Granite models are another family available for open use. The Granite 3.3 iteration, for example, offers variants with 2B and 8B parameters, providing options suitable for different local hardware configurations.
Models with advanced capabilities
Beyond general text generation, many open-source models excel in specific advanced capabilities:
Reasoning models: Models like DeepSeek-R1 and specific fine-tunes of Llama or Mistral are often optimized for complex reasoning, problem-solving, and logical deduction tasks. Microsoft’s Phi family of models also offer reasoning variants, in the form of phi4-reasoning and phi4-mini-reasoning.
Mixture-of-experts (MoE): This architecture allows models to scale efficiently by activating only relevant "expert" parts of the network for a given input. Qwen 3 is a MoE model, and Granite also has a MoE variant in the form of granite3.1-moe.
Tool calling models: The ability for an LLM to use external tools (like APIs or functions) is fundamental to building agentic AI systems. Models are increasingly being trained or fine-tuned with tool-calling capabilities, allowing them to interact with external systems to gather information or perform actions. Frameworks like LangChain or LlamaIndex often facilitate this when running models locally. Examples include qwen3, granite3.3, mistral-small3.1 and phi4-mini.
Vision models: sometimes also called multimodal models, are models that can understand and interpret images alongside text. They are becoming more common in the open-source space. Examples include Granite3.2-vision, llama3.2-vision, llava-phi3, and BakLLaVA (which is derived from Mistral 7B).
Models that excel at specific tasks
Sometimes, you need a model fine-tuned for a particular domain or task for optimal performance.
Coding assistants:
DeepCoder: A fully open-source family (1.5B and 14B parameters) aimed at high-performance code generation.
OpenCoder: An open and reproducible code LLM family (1.5B and 8B models) supporting chat in English and Chinese.
Qwen2.5-Coder: Part of the Qwen family, specifically optimized for code-related tasks.
Math and research
Starling-LM-11B-alpha: Mistral-based model for research and instruction-following.
Mathstral: Specialized Mistral AI model for advanced mathematical reasoning.
Qwen2-math: Part of the Qwen family, specifically optimized for complex mathematical problem-solving.
Creative writing
Mistral-7B-OpenOrca: A fine-tuned version of Mistral AI's base Mistral-7B model, specifically enhanced by training on a curated selection of the OpenOrca dataset.
Choosing the right open-source model depends heavily on your specific needs, the tasks you want to perform, and the hardware you have available. Experimenting with different models is often the best way to find the perfect fit for your local LLM setup.
How to run LLMs locally?
To run LLMs locally, the first step is choosing which model best fits your needs. Once you've selected a model, the next decision is how to run it—most commonly using software like Ollama. However, Ollama isn’t your only option. There are several other powerful and user-friendly tools available for running local LLMs, each with its own strengths.
Let’s explore some of the most popular choices below!
Ollama (+ OpenWebUI)
Ollama homepage
Ollama is a command-line tool that simplifies the process of downloading and running LLMs locally. It has a simple set of commands for managing models, making it easy to get started.
Ollama is ideal for quickly trying out different open-source LLMs, especially for users comfortable with the command line. It’s also the go-to tool for homelab and self-hosting enthusiasts who can use Ollama as an AI backend for various applications.
While Ollama itself is primarily a command-line tool, you can enhance its usability by pairing it with OpenWebUI, which provides a graphical interface for interacting with your LLMs.
Primarily command-line based (without OpenWebUI), which may not be suitable for all users.
LM Studio
LM Studio homepage
LM Studio is a platform designed to make it easy to run and experiment with LLMs locally. It offers a range of tools for customizing and fine-tuning your LLMs, allowing you to optimize their performance for specific tasks.
It is excellent for customizing and fine-tuning LLMs for specific tasks, making it a favorite among researchers and developers seeking granular control over their AI solutions.
Pros
Model customization options
Ability to fine-tune LLMs
Track and compare the performance of different models and configurations to identify the best approach for your use case.
Runs on most hardware and major operating systems
Cons
Steeper learning curve compared to other tools
Fine-tuning and experimenting with LLMs can demand significant computational resources.
Jan
Jan chat interface
Jan is another noteworthy option for running LLMs locally. It places a strong emphasis on privacy and security. It can be used to interact with both local and remote (cloud-based) LLMs.
One of Jan's unique features is its flexibility in terms of server options. While it offers its own local server, Jan can also integrate with Ollama and LM Studio, utilizing them as remote servers. This is particularly useful when you want to use Jan as a client and have LLMs running on a more powerful server.
Pros
Strong focus on privacy and security
Flexible server options, including integration with Ollama and LM Studio
Jan offers a user-friendly experience, even for those new to running LLMs locally
Cons
While compatible with most hardware, support for AMD GPUs is still in development.
GPT4All
GPT4All chat interface
GPT4All is designed to be user-friendly, offering a chat-based interface that makes it easy to interact with the LLMs. It has out-of-the-box support for “LocalDocs”, a feature allowing you to chat privately and locally with your documents.
Pros
Intuitive chat-based interface
Runs on most hardware and major operating systems
Open-source and community-driven
Enterprise edition available
Cons
May not be as feature-rich as some other options, lacking in areas such as model customization and fine-tuning.
NextChat
nextchat homepage
NextChat is a versatile platform designed for building and deploying conversational AI experiences. Unlike the other options on this list, which primarily focus on running open-source LLMs locally, NextChat excels at integrating with closed-source models like ChatGPT and Google Gemini.
Pros
Compatibility with a wide range of LLMs, including closed-source models
Robust tools for building and deploying conversational AI experiences
Enterprise-focused features and integrations
Cons
May be overkill for simple local LLM experimentation
Geared towards more complex conversational AI applications.
How to run a local LLM with n8n?
Now that you’re familiar with what local LLMs are, the hardware and software they require, and the most popular tools for running them on your machine, the next step is putting that power to work.
If you're looking to automate tasks, build intelligent workflows, or integrate LLMs into broader systems, n8n offers a flexible way to do just that.
In the following section, we’ll walk through how to run a local LLM with n8n—connecting your model, setting up a workflow, and chatting with it seamlessly using tools like Ollama.
n8n uses LangChain to simplify the development of complex interactions with LLMs such as chaining multiple prompts together, implementing decision making and interacting with external data sources. The low-code approach that n8n uses, fits perfectly with the modular nature of LangChain, allowing users to assemble and customize LLM workflows without extensive coding.
Now, let's also explore a quick local LLM workflow!
With this n8n workflow, you can easily chat with your self-hosted Large Language Models (LLMs) through a simple, user-friendly interface. By hooking up to Ollama, a handy tool for managing local LLMs, you can send prompts and get AI-generated responses right within n8n:
Step 1: Install Ollama and run a model
Installing Ollama is straightforward, just download the Ollama installer for your operating system. You can install Ollama on Windows, Mac or Linux.
After you’ve installed Ollama, you can pull a model such as Llama3, with the ollama pull llama3 command:
terminal command for running Ollama
Depending on the model, the download can take some time. This version of Llama3, for example, is 4.7 Gb.After the download is complete, run ollama run llama3 and you can start chatting with the model right from the command line!
Step 2: Set up a chat workflow
Let’s now set up a simple n8n workflow that uses your local LLM running with Ollama. Here is a sneak peek of the workflow we will build:
n8n workflow with local LLM using Ollama
Start by adding a Chat trigger node, which is the workflow starting point for building chat interfaces with n8n. Then we need to connect the chat trigger to a Basic LLM Chain where we will set the prompt and configure the LLM to use.
Step 3: Connect n8n with Ollama
Connecting Ollama with n8n couldn’t be easier thanks to the Ollama Model sub-node! Ollama is a background process running on your computer and exposes an API on port 11434. You can check if the Ollama API is running by opening a browser window and accessing http://localhost :11434, and you should see a message saying “Ollama is running”.
For n8n to be able to communicate with Ollama’s API via localhost, both applications need to be on the same network. If you are running n8n in Docker, you would need to start the Docker container with the --network=host parameter. That way the n8n container can access any port on the host’s machine.
To set a connection between n8n and Ollama, we simply leave everything as default in the Ollama connection window:
n8n Ollama connection setup
After the connection to the Ollama API is successful, in the Model dropdown you should not see all the models you’ve downloaded. Just pick the llama3:latest model we’ve downloaded earlier.
choosing a local model to use with n8n
Step 4: Chat with Llama3
Next, let's chat with our local LLM! Click the Chat button on the bottom of the workflow page to test it out. Type any message and your local LLM should respond. It’s that easy!
chatting with local LLMs in n8n
Wrap up
Running LLMs locally is not only doable but also practical for those who prioritize privacy, cost savings, or want a deeper understanding of AI.
Thanks to tools like Ollama, which make it easier to run LLMs on consumer hardware, and platforms like n8n, which help you build AI-powered applications, using LLMs on your own computer is now simpler than ever!
What’s next?
Now that you've explored how to run LLMs locally, why not dive deeper into practical applications? Check out these YouTube videos:
Digital forensics plays a critical role in modern cybersecurity — whether it’s responding to a data breach, investigating insider threats, or performing incident analysis after suspicious behavior. In my work as a security-minded engineer and DevSecOps practitioner, I’ve frequently had to identify, collect, and analyze digital evidence across endpoints, servers, and cloud environments.
In this blog post, I’ll walk you through the tools and technologies I rely on to conduct effective digital forensics investigations — categorized by use case.
๐ง What Is Digital Forensics?
At its core, digital forensics is about identifying, preserving, analyzing, and reporting on digital data in a way that’s legally sound and technically accurate. The goal is to reconstruct events, identify malicious activity, and support security incident response.
๐งฐ My Go-To Tools for Digital Forensics Investigations
๐️ Disk & File System Analysis
These tools help examine hard drives, deleted files, system metadata, and more:
Autopsy (The Sleuth Kit) – A GUI-based forensic suite for analyzing disk images, file recovery, and timelines.
FTK Imager – For creating and previewing forensic images without altering the original evidence.
dd / dc3dd – Command-line tools to create low-level forensic disk images in Linux environments.
EnCase (Basic familiarity) – A commercial powerhouse in forensic investigations, used primarily for legal-grade evidence analysis.
๐งฌ Memory Forensics
Memory (RAM) often holds short-lived but critical evidence, like injected malware, live sessions, or loaded processes.
Volatility Framework – Extracts details like running processes, DLLs, command history, network activity, and more from memory dumps.
Rekall – An alternative memory analysis framework focused on automation and deep system state inspection.
✅ I’ve used Volatility to trace injected PowerShell payloads and enumerate hidden processes in live incident simulations.
๐ Network Forensics
Capturing and analyzing network traffic is essential for spotting data exfiltration, command-and-control activity, or lateral movement.
Wireshark – Industry standard for packet analysis and protocol dissection.
tcpdump – Lightweight CLI tool to capture traffic in headless environments or remote systems.
NetworkMiner – Parses PCAP files to extract files, sessions, and credentials automatically.
๐ Log & Timeline Analysis
Understanding what happened — and when — is key to reconstructing incidents.
Timesketch – A timeline analysis tool for visualizing and collaborating on event data.
Log2Timeline (Plaso) – Converts log files, browser histories, and system events into structured timelines.
Sysinternals Suite – Includes gems like Procmon, PsExec, and Autoruns for Windows incident response.
๐งช Malware Analysis (Static & Dynamic)
Understanding what a file does — before or while it runs — helps detect advanced threats and APT tools.
Ghidra – Powerful open-source reverse engineering tool from the NSA for analyzing executables.
x64dbg / OllyDbg – Popular debuggers for inspecting Windows executables.
Hybrid Analysis / VirusTotal – Cloud-based tools to scan files and observe sandbox behavior.
Cuckoo Sandbox – An open-source automated sandbox for observing malware behavior in a VM.
☁️ Cloud & Endpoint Forensics
Modern investigations often span cloud platforms and remote endpoints:
AWS CloudTrail, GuardDuty – Audit user and API activity in cloud environments.
Microsoft Azure Defender – For cloud-native threat detection and log correlation.
CrowdStrike Falcon / SentinelOne – Endpoint Detection and Response (EDR) tools for retrieving artifacts, hunting threats, and isolating compromised machines.
๐งฐ Scripting & Automation
Scripting accelerates collection, triage, and analysis — especially in large-scale environments.
Python – I use it to build custom Volatility plugins, PCAP parsers, or automate alert triage.
Bash / PowerShell – For live memory dumps, log gathering, process inspection, and rapid automation.
๐งฉ MITRE ATT&CK & DFIR Methodology
I map artifacts and behaviors to MITRE ATT&CK techniques (e.g., T1055 – Process Injection) to align with industry standards and communicate findings effectively.
I also follow established methodologies like:
SANS DFIR process
NIST 800-61 Incident Handling Guide
Custom playbooks for containment, eradication, and recovery
✅ Summary: Digital Forensics Tools I Use
๐น Disk & File System Analysis
Autopsy (Sleuth Kit) – GUI-based forensic suite
FTK Imager – Create and inspect forensic images
dd / dc3dd – Low-level disk imaging on Linux
EnCase – Commercial tool for deep disk investigations (basic familiarity)
๐น Memory Forensics
Volatility – Extract processes, DLLs, and sessions from RAM dumps
Rekall – Advanced volatile memory analysis
๐น Network Forensics
Wireshark – Protocol and packet analysis
tcpdump – Command-line traffic capture
NetworkMiner – Extracts files and sessions from PCAP files
๐น Log & Timeline Analysis
Timesketch – Timeline visualization and correlation
Plaso (log2timeline) – Converts raw logs into a forensic timeline
Sysinternals Suite – Live system inspection (Procmon, PsExec, Autoruns)
๐น Malware Analysis
Ghidra – Static reverse engineering
x64dbg / OllyDbg – Debuggers for binary inspection
Hybrid Analysis / VirusTotal – Behavioral analysis and threat intel
AWS CloudTrail / GuardDuty – Monitor API and security activity
Microsoft Defender / Azure Logs – Cloud-native alerting and forensics
CrowdStrike Falcon / SentinelOne – EDR tools for endpoint activity and IOC collection
๐น Scripting & Automation
Python – For custom plugins, log parsers, automation
Bash / PowerShell – For system triage, memory dumps, and log collection
๐น Methodology
Align findings with MITRE ATT&CK
Follow structured DFIR frameworks like SANS, NIST 800-61, and custom playbooks
๐ฏ Final Thoughts
Digital forensics isn’t just for breach responders — it’s a key skill for DevSecOps, SDETs, and any security-conscious engineer. Whether you’re building incident response workflows, simulating attacks, or validating your EDR, knowing how to collect and interpret evidence makes you far more effective.
In today’s software-driven world, APIs are everywhere — powering everything from mobile apps to microservices. But with complexity comes risk. A single missed edge case in an API can crash systems, leak data, or block users. That’s a huge problem.
After years of working on high-scale automation and quality engineering projects, I decided to build something that tackles this challenge head-on:
๐ A Universal API Testing Tool powered by automation, combinatorial logic, and schema intelligence.
This tool is designed not just for test engineers — but for anyone who wants to bulletproof their APIs and catch critical bugs before they reach production.
๐ The Problem with Manual API Testing
Let’s face it: manual API testing, or even scripted testing with fixed payloads, leaves massive blind spots. Here’s what I’ve consistently seen across projects:
๐ Happy path bias: Most tests cover only expected (ideal) scenarios.
❌ Boundary and edge cases are rarely tested thoroughly.
๐งฑ Schema mismatches account for over 60% of integration failures.
๐ Complex, nested JSON responses break traditional test logic.
Even with the best intentions, manual testing only touches ~15% of real-world possibilities. The rest? They’re left to chance — and chance has a high failure rate in production.
๐ก Enter: The Universal API Testing Tool
This tool was created to turn a single API request + sample response into a powerful battery of intelligent, automated test cases. And it does this without relying on manually authored test scripts.
Let’s break down its four core pillars:
๐ 1. Auto-Schema Derivation
Goal: Ensure every response conforms to an expected structure — even when you didn’t write the schema.
Parses sample responses and infers schema rules dynamically
Detects type mismatches, missing fields, and violations of constraints
Supports deeply nested objects, arrays, and edge data structures
Validates responses against actual usage, not just formal docs
๐ง Think of it like “JSON Schema meets runtime intelligence.”
๐งช 2. Combinatorial Test Generation
Goal: Generate hundreds of valid and invalid test cases automatically from a single endpoint.
Creates diverse combinations of optional/required fields
Performs boundary testing using real-world data types
Generates edge case payloads with minimal human input
Helps you shift-left testing without writing 100 test cases by hand
๐ This is where real coverage is achieved — not through effort, but through automation.
๐ 3. Real-Time JSON Logging
Goal: Provide debuggable, structured insights into each request/response pair.
Captures and logs full payloads with status codes, headers, and durations
Classifies errors by type: schema, performance, auth, timeout, etc.
Fully CI/CD compatible — ready for pipeline integration
๐งฉ Imagine instantly knowing which combination failed, why it failed, and what payload triggered it.
๐ 4. Advanced Security Testing
Goal: Scan APIs for common and high-risk vulnerabilities without writing separate security scripts.
Built-in detection for:
XSS, SQL Injection, Command Injection
Path Traversal, Authentication Bypass
Regex-based scans for sensitive patterns (UUIDs, tokens, emails)
Flags anomalies early during development or staging
๐ก️ You don’t need a separate security audit to find the obvious vulnerabilities anymore.
⚙️ How It Works (Under the Hood)
Developed in Python, using robust schema libraries and custom validation logic
Accepts a simple cURL command or Postman export as input
Automatically generates:
Schema validators
Test payloads
Execution reports
Debug mode shows complete request/response cycles for every test case
๐ What You Can Expect
The tool is in developer preview stage — meaning results will vary based on use case — but here’s what early adopters and dev teams can expect:
⏱️ Save 70–80% of manual testing time
๐ Catch 2–3x more bugs by testing combinations humans often miss
⚡ Reduce integration testing time from days to hours
๐ Get built-in security scans with every API run — no extra work required
๐ฌ Your Turn: What’s Your Biggest API Testing Challenge?
I’m actively working on v2 of this tool — with plugin support, OpenAPI integration, and enhanced reporting. But I want to build what developers and testers actually need.
So tell me:
➡️ What’s the most frustrating part of API testing in your projects?
Drop a comment or DM me. I’d love to learn from your use cases.
๐ Work With Me
Need help building test automation frameworks, prepping for QA interviews, or implementing CI/CD quality gates?
Agentic AI and other buzzwords are emerging almost monthly if not more often. In reality they all describe different variations of Agentic Systems, it might be n agentic workflow or multi-agent system, it’s just a different topology under the same umbrella.
If you are considering a career in AI Engineering in 2025, it might feel overwhelming and that is completely normal.
But you need to remember - you are not too late to the game. The role as such has only emerged over the past few years and is still rapidly evolving.
In order to excel in this competitive space, you will need a clear path and focused skills.
Here is a roadmap you should follow if you want to excel as an AI Engineer in today’s landscape.
Fundamentals - learn as you go.
I have always been a believer that learning fundamentals is key to your career growth. This has not changed.
However, I have to admit that the game itself has changed with the speed that the industry is moving forward. Staring of with fundamentals before anything else is no longer an option. Hence, you should be continuously learning them as you build out modern AI Engineering skillset.
Here is a list of concepts and technologies I would be learning and applying in my day-to-day if I were to start fresh.
The Fundamentals.
Python and Bash:
FastAPI - almost all of the backed services implemented in Python are now running as FastAPI servers.
Pydantic - the go to framework for data type validation. It is now also a Python standard for implementing structured outputs in LLM based applications.
uv - the next generation Python package manager. I haven’t seen any new projects not using it.
git - get your software version control fundamentals right.
Asynchronous programming - extremely important in LLM based applications as your Agentic topologies will often benefit from calling multiple LLM APIs asynchronously without blocking.
Learn how to wrap your applications into CLI tools that can be then executed as CLI scripts.
Statistics and Machine Learning:
Understand the non-deterministic nature of Statistical models.
Types of Machine Learning models - it will help you when LLMs are not the best fit to solve non-deterministic problem.
General knowledge in statistics will help you in evaluating LLM based systems.
Don’t get into the trap of thinking that AI Engineering is just Software Engineering with LLMs, some maths and statistics is involved.
LLM and GenAI APIs.
You should start simple, before picking up any LLM Orchestration Framework begin with native client libraries. The most popular is naturally OpenAI’s client, but don’t disregard Google’s genai library, it is not compatible with OpenAI APIs but you will find use cases for Gemini models for sure.
So what should you learn?
LLM APIs.
Types of LLMs:
Foundation vs. Fine-tuned.
Code, conversational, medical etc.
Reasoning Models.
Multi-Modal Models.
Structured outputs:
Learn how OpenAI and Claude enforces structured outputs via function calling and tool use.
Try out simple abstraction libraries like Instructor - they are enough for most of the use cases and uses pydantic for the structure definition natively.
Prompt Caching:
Learn how KV caching helps in reducing generation latency and costs.
Native prompt caching provided by LLM providers.
How LLM serving frameworks implement it in their APIs (e.g. vLLM).
Model Adaptation.
I love the term Model Adaptation. The first time (and maybe the only time) I’ve seen it in literature was in the book “AI Engineering” by
Tool Use is not magic, learn how it is implemented via context manipulation.
Don’t rush to agents yet, learn how LLMs are augmented with tools first.
You might want to pick up a simple LLM Orchestrator Framework at this stage.
Storage and Retrieval.
Storage and Retrieval.
Vector Databases:
Learn strengths and weaknesses of vector similarity search.
Different types of Vector DB indexes: Flat, IVFFlat, HNSW.
When PostgreSQL pgvector is enough.
Graph Databases:
High level understanding about Graph Databases.
Don’t spend too much time here as there is still limited use for Graph DBs even though the promises connected with Graph Retrieval were and still are big.
Current challenges still revolve around the cost of data preparation for Graph Databases.
Hybrid retrieval:
Learn how to combine the best from keyword and semantic retrieval to get the most accurate results.
RAG and Agentic RAG.
RAG and Agentic RAG.
Data Preprocessing:
Learn data clean data before computing Embeddings.
Different chunking strategies.
Extracting useful metadata to be stored next to the embeddings.
Advanced techniques like Contextual Embeddings.
Data Retrieval, Generation and Reranking:
Experiment with amount of data being retrieved.
Query rewriting strategies.
Prompting for Generation with retrieved Context.
Learn how reranking of retrieved results can improve the accuracy of retrieval in your RAG and Agentic RAG systems.
MCP:
Agentic RAG is where MCP starts to play a role, you can implement different data sources behind MCP Servers. By doing so you decouple the domain responsibility of the data owner.
LLM Orchestration Frameworks:
You don’t need to rush with choosing Orchestration Framework, most of them hide the low level implementation from you and you would be better off starting out without any Framework whatsoever and using light wrappers like Instructor instead.
Once you want to pick up and Orchestrator, I would go for the popular ones because that is what you run into in the real world:
LangChain/LangGraph.
CrewAI.
LlamaIndex
Test out Agent SDKs of Hyper-scalers and AI Labs.
AI Agents.
AI Agents.
AI Agent and Multi-Agent Design Patterns:
ReAct.
Task Decomposition.
Reflexion.
Planner-Executor.
Critic-Actor.
Hierarchical.
Collaborative.
…
Memory:
Learn about Long and Short-Term memory in Agentic Systems and how to implement it in real world.
Try out mem0 - the leading Framework in the industry for managing memory. It now also has an MCP server that you can plug into your agents.
Human in or on the loop:
Learn hoe to delegate certain actions back to humans if the Agent is not capable to solve the problem or the problem is too sensitive.
Human in the loop - a human is always responsible for confirming or performing certain actions.
Human on the loop - the Agent decides if human intervention is needed.
A2A, ACP, etc.:
Start learning Agent Communication Protocols like A2A by google or ACP by IBM.
There are more Protocols popping out each week, but the idea is the same.
Internet of Agents is becoming a real thing. Agents are implemented by different companies or teams and they will need to be able to communicate with each other in a distributed fashion.
Agent Orchestration Frameworks:
Put more focus on Agent Orchestration Frameworks defined in the previous section.
Infrastructure.
Infrastructure.
Kubernetes:
Have at least basic understanding of Docker and Kubernetes.
If your current company does not use K8s, it is more likely you will run into the one that does use it rather than the opposite.
Cloud Services:
Each of the major cloud providers have their own set of services meant to help AI builders:
Azure AI Studio.
Google Vertex AI.
AWS Bedrock.
CI/CD:
Learn how to implement Evaluation checks into your CI/CD pipelines.
Understand how Unit Eval Tests are different from Regression Eval Tests.
Load test your applications.
Model Routing:
Learn how to implement Model fallback strategies to make your
Try tools like liteLLM, Orq or Martian.
LLM Deployment:
Learn basics of LLM deployment Frameworks like vLLM.
Don’t focus too much on this as it would be a rare case that you would need to deploy your own models in real world.
Observability and Evaluation.
Observability and Evaluation.
AI Agent Instrumentation:
Learn what SDKs exist for instrumenting Agentic applications, some examples:
Langsmith SDK.
Opik SDK.
Openllmetry.
…
Learn Multi-Agent system Instrumentation. How do we connect traces from multiple agents into a single thread.
You can also dig deeper into OpenTelemetry because most of the modern LLM Instrumentation SDKs are built on top of it.
Observability Platforms:
There are many Observability platforms available off the shelf, but you nee to learn the fundamentals of LLM Observability:
Traces and Spans.
Evaluation datasets.
Experimenting with changes to your application.
Sampling Traces.
Prompt versioning and monitoring.
Alerting.
Feedback collection.
Annotation.
Evaluation Techniques:
Understand the costs associated with LLM-as-a-judge based evaluations:
Latency related.
Monetary related.
Know in which step of the pipeline you should be running evaluations to get most out of it. You will not be able to evaluate every run in production due to cost constraints.
Learn alternatives to LLM based evaluation:
Rule based.
Regex based.
Regular Statistical measures.
Recently, I wrote a piece on building and evolving your Agentic Systems. The ideas I put out are very tightly connected with being able to Observe and Evaluate your systems as they are being built out. Read more here:
Security.
Security.
Guardrails:
Learn how to guardrail inputs to and outputs from the LLM calls.
Different strategies:
LLM based checks.
Deterministic rules (e.g. Regex based).
Try out tools like GuardrailsAI.
Testing LLM based applications:
Learn how to test the security of your applications.
Try to break your own Guardrails and jailbreak from system prompt instructions.
Performing advanced Red Teaming to test emerging attack strategies and vectors.
Looking Forward.
The future development of Agents will be an interesting area to observe. A lot of successful startups are most likely to succeed due to having one of the following:
Distribution.
Good UX.
Real competitive motes, like physical products. Here is where robotics comes into play.
Looking Forward Elements.
Voice, Vision and Robotics:
An interesting blend of capabilities that would allow a physical machine to interact with the world. The areas that I am looking forward to are:
On-device Agents.
Extreme Quantisation techniques.
Foundation Models tuned specifically for robotics purposes.
Automated Prompt Engineering:
New techniques are emerging that allow you to perform automated Prompt Engineering given that you have good test datasets ready for evaluation purposes.
Play around with frameworks like DsPy or AdalFlow.
Summary.
The skillset requirements for AI Engineers are becoming larger every month. The truth is that in your day-to-day you will only need a subset of it.
You should always start with your immediate challenges and adapt the roadmap accordingly.
However, don’t forget to look back and learn the fundamental techniques that power more advanced systems. In many cases these fundamentals are hidden behind layers of abstraction.