How we designed multi-tenant, multi-cloud and multi-model AI platform with Advanced RAG - Fiber Copilot | AI Studio by Tech Fabric

How we designed multi-tenant, multi-cloud and multi-model AI platform with Advanced RAG - Fiber Copilot

This article describes how Tech Fabric designed a multi-tenant, multi-cloud and multi-model generative AI system that can be used to create AI chatbots and copilots trained on enterprise data quickly.

Tech Fabric is a Digital Transformation company that has been helping enterprises embrace the latest advancements in technology to improve their operational efficiencies, reduce costs, and deliver value to their internal employees or customers.‍

Our customers rely on our guidance regarding the technology landscape and how they can leverage newer technological advancements to be more effective, disruptive, and competitive.

Whether it’s implementing better ways of managing data, providing better solutions to integrate with their partners, or building really pleasant UX for their systems, we’ve always strived to be their trusted adviser and bring them value through the introduction of the right architecture, tools, frameworks, or solutions to address their needs and help them stay ahead of their competition.‍

With the recent advancements in Generative AI space, we’ve had many conversations with our clients on how they can take advantage of Large Language Models and introduce Generative AI experiences in their applications that can democratize access to data, simplify their current workflows, and help their employees be more productive.

Since every client is different and their needs are different, we often had to build custom solutions tailored for their needs and found ourselves doing the same thing repeatedly. Since there are many components and moving pieces in a Generative AI system, there’s no way to build a one-size fits all system. We choose the components based on cost, speed, and features that are appropriate for our clients.‍

A typical Generative AI system will have these building blocks and depending on the company and their use case, we often have to choose different components, often spreading across native cloud provider components to third-party and custom-built components.‍

Vector Database (Azure AI Search, Quadrant, Cloudflare Vector Store, Elastic Search, Vertex AI Search etc.,)
Document Parsing Engine (Azure Document Intelligence, Llama Parse etc.,)
AI Model Serving Platform (Azure OpenAI, AWS Bedrock etc.,)
Knowledge Graphs
Intelligent Data Platform (Databricks, Snowflake, Azure Fabric, etc.,)
File Storage (Azure Blobs, AWS S3)
Retrieval Augmented Generation (RAG) techniques
Workflow Orchestration for continuous training (Temporal)
Identity Access Management (Azure Entra ID, Ping, Okta etc.,)

and many more…

As you can see, there are a few core components that are absolutely needed for every meaningful Generative AI platform. But not every enterprise uses the same cloud platform, has the same set of requirements or even tries to solve the same problem.

Some of them have tons of documents in unstructured format (PDFs, spreadsheets, word docs, handwritten notes) accumulated over decades. Some of them have intelligent data platforms with advanced capabilities. Some of them have semi-structured data and traditional relational databases.

Each scenario requires using a different kind of parsing engine that’s more suited for our customers’ needs, a different vector database, a different foundational AI model that’s fine tuned for their use case, etc.,

For this reason, we often find ourselves trying to build a custom platform for each of our clients with manual setup and integration. This approach isn’t future proof, isn’t cost effective for our customers, and can quickly go out of date, considering the pace at which innovation is happening in this space.‍

There’s got to be a better way to rapidly provision infrastructure and swap out components as newer, better components arrive. We also want to have a platform that can host multiple tenants and provides an amazing user experience (UX) to ingest, process, train and deploy AI chatbots and copilots on-the-fly.

Introducing Fiber Copilot Platform

That’s why we decided to build Fiber, a new multi-tenant, multi-cloud, and multi-model AIOps platform, that can be used to quickly create and deploy AI chatbots and copilots trained on enterprise data.

Users will have full flexibility to choose various options for components in the system and provision underlying infrastructure at the click of a button, reducing months’ worth of work to few hours.

For example, they’ll have an option to choose Elastic Search for vector search, Azure Document Intelligence for parsing engine and Azure OpenAI for model serving and deploy it to Azure. Or they could choose Qdrant for vector search, LlamaParse for parsing engine and AWS Bedrock for model serving and deploy it to AWS. They’ll have many more options to choose from native cloud providers options to third party and opensource options.

It’s their choice. What platform they want to deploy to and what components best fit their use case is for up to them to choose. Fiber provides them total control over their data privacy and secure access to cloud infrastructure.

Fiber will enable the companies to quickly create chatbots and copilots trained on their proprietary data and deploy it across their Enterprise through granularly controlled permissions and security policies.‍

Multi-Tenant Architecture

‍Here’s high level architecture.

We’ve decided to build a multi-tenant system that can be easily provisioned while fully preserving our customers’ data privacy and control over their data.

Each tenant will have all their resources provisioned in their own cloud subscription. All their data, vector stores, AI models, parsing engines will be provisioned in their cloud platform of choice. Fiber applications and APIs will be granted granular access to resources based on the permissions granted in the Identity and Access Management System authenticated with OAuth 2.0. The entire infrastructure provisioning process is automated and orchestrated with Temporal workflows and its durable execution capabilities.

‍

‍

Tying it all together with Temporal

Since we have so many dependencies on external cloud providers and their APIs, we needed to have a durable and deterministic execution of processes in our system. Temporal durable execution framework provides a guarantee that no matter what happens — process crashes, network or storage outages, its orchestrator helps the failed processes recover by rehydrating them in a different server and continue the flow of execution.

Temporal framework is the backbone for all our underlying workflows. We execute workflows during onboarding new clients, and users, provisioning infrastructure, training AI models, creating vector embeddings etc.,

The beauty of Temporal framework is that it works with our pre-existing choices for runtime, test framework, CI/CD environments, and any web framework. We don’t need to choose a specific language or server technology to make use of Temporal workflows. We have workloads written in Python, Typescript, C#, .NET Core, Bicep DSL, Terraform, etc., and Temporal just works with every one of those technologies.

There’s no other workflow orchestration engine that’s as flexible as Temporal and supports all the major programming languages. It’s truly a joy to work with it and we couldn’t be more excited.

This flexible architecture of Fiber allows us to quickly provision Generative AI platform for our clients so they can jump right into creating AI chatbots but is also future proof. If newer models arrive, or there’s a better vector search database, or newer parsing engine works better, it’s only a matter of a few clicks to swap these components and Temporal workflow will do the rest to retrain the AI model and populate the vector store!

Check out Fiber Copilot and reach out if you’d like to try it out!

‍