Unleashing the Power of Generative AI: Building Practical Applications

Robot thinking

In the ever-evolving landscape of artificial intelligence, generative AI has emerged as a groundbreaking force, offering a new frontier for creative and practical solutions across various industries. This transformative technology holds the potential to automate processes, personalize experiences, and generate innovative content, propelling the development of applications that were once the domain of science fiction. As the power of machine learning models continues to transcend boundaries, this blog will serve as your navigational chart through the intricacies of generative AI, equipping you with the knowledge and tools necessary to transform abstract concepts into tangible, practical applications that can reshape the way we interact with the digital world. Join us as we embark on a journey to unlock the full capabilities of generative AI, and explore how its application can bring unprecedented value to your projects and endeavors.

Picking the right Generative AI provider for your application

Selecting the ideal generative artificial intelligence (AI) model for your specific application is a critical step that can determine the success of your project. The landscape of generative AI is vast, encompassing various models designed for different tasks, such as natural language processing, image generation, and data synthesis. To ensure that you pick the right generative AI model for your application, you must consider several key factors.

First and foremost, clearly define the objectives of your application. What kind of output are you looking for? Are you aiming to generate text, images, audio, or another form of content? Your goal will directly influence the type of generative AI model you’ll need. For example, if you need to produce human-like text or interactive chatbot, models like GPT-4 (OpenAI) or BARD (Google) or LLama2 (Meta) or Titan (AWS Bedrock) may be suitable. However, for image generation tasks, models such as DALL-E-3 or Stable Diffusion could be more appropriate.

Moreover, in selecting an appropriate language model (LLM) tailored to your application, it is imperative to take into account additional functionalities that align with your audience’s needs. These include, but are not limited to, multilingual support to cater to a diverse user base. Furthermore, it is essential to assess whether the application requires the capability to initiate external API calls or to execute specific security-related operations or handle various data payloads. Each of these considerations should be factored into the decision-making process to ensure seamless and secure user interactions within your application.

Optimizing Business Solutions with OpenAI's LLM Use Case Analysis

Each large language model undergoes training to fulfill distinct purposes, and prior to utilization, it is imperative to closely examine its intended application and constraints. For certain intricate use cases, amalgamating one or more models may be required to successfully attain the desired outcome. It is essential to conduct a thorough analysis to ensure the chosen model aligns with the complexity and specificity of the task at hand.

(Chat) gpt-4–1106-preview: usage for chatbot, deciding when to request for an API calls (it will not actually do the call, but will prepare the payload and function name for your application to call the API and feed back to the model to response or you can directly response without chatbot) so far support 128,000 tokens

(Embedding) text-embedding-ada-002: usage for semantic search, this model is used for Text to Embedding data that you can save in vector database, to perform similar search, which is practical for knowledge base search, Q&A application and finding the right answer much quicker. 8,191 input tokens output dimensions 1536 (this value important in Vector DB schema)

(Speech to Text) whisper-1: convert audio to text, which will be good for multilingual audio input for your customer and converted to text for summary or passing to the text-based chatbot like ChatGPT-4 API. limited to 25 MB audio file

(Text to Speech) tts-1: text to speech model to enable your application to speak to your audience in a close to nature voices. Also support multilingual in a lot of major language spoken worldwide.

(Vision) gpt-4-vision-preview: allows the model to take in images and answer questions about them. As chat model can only accept

(Generate Images) dall-e-3 dall-e-2: creating images from scratch based on a text prompt. (DALL-E-2 only) Creating edited versions of images by having the model replace some areas of a pre-existing image, based on a new text prompt, Creating variations of an existing image.

How to start building AI driven applications

As a software engineer, there are a lot of tools and frameworks in the opensource community that you could use to speed up your application development. But first of all you need to pick a pre-trained LLM like OpenAI which has developer friendly API here: https://platform.openai.com/docs/introduction

vector DB
  • Vector Database: Chroma, an open-source embeddings vector database to store those vectors (thousands of float numbers) and provide fast similarity search and able save meta data. Tradition RDBs also supports vector search like PostgreSQL v15+ via community support from pgvector and AWS service already support this extension by default. The good benefits of using RDBs is that you could cross join your relational data with vector similarity search enabled so you don’t need to spend additional time to enrich your data while searching in another VectorDB and leverage your existing RDBs.

  • Building Web Chatbot: Vercel recently come up with a lot of supports and libraries to support you to build a AI driven web solution. Such as using Nextjs react fullstack framework with additional package call Vercel AI that you could easily build an streaming support chatbot with their framework. They also recently release a playground for you to try different models and LLM features also the chat demo. Using those framework and prebuild packages will save engineer huge amount of time.

  • Building automation tools: using generative ML model, you can create a summary tools for customer support like phone calls, using Speech to Text ML to convert the voice from customer to text then pass to a Chatbot ML to understand the intents then call internal APIs to perform some low risk operation like gathering user information and try to summarize what the phone call is after before routing to a human agent. Such as using Amazon Lex + Amazon connect.

  • Video and Browser side LLM: In some cases, you might need to run ML in the client side like browsers and mobile app, then you will need to use some small ML like MobilenetV3 to do image or video inference, and google provide some opensource solution with WASM (WebAssembly) the Google Mediapipe has a lot of pre-build library to simplify the runtime in web assembly for LLM, so you could do real-time video object detection for people and cars which will use small pre-trained model to run in client device to detect objects and reduce the network dependency.

Chatbot workflow

The workflow is vital for the proper functioning of the application in determining when and where to access resources or data. The LLM chat model is designed to accurately identify and determine the intended direction of the conversation. Subsequently, the application will appropriately handle the corresponding action. A clear workflow will help engineer to understand how the applications are interacting with each other.

Limitations to be Mindful of in ChatGPT and LLM Chat Models

Limited Input Tokens: All LLM models have a predetermined limit on the number of tokens they can process as input. Consequently, each inference conducted by the AI will have a restricted amount of background context available for processing. While the AI possesses some pre-trained knowledge about the world, it lacks specific domain knowledge regarding business or unpublished information. The usage of tokens is the sole means through which the Chatbot can receive examples or information pertaining to the current inference. The latest ChatGPT-4 model can handle 128k tokens which roughly 128k x 0.7 = 89.6k words which is large enough to hold some background or previous conversation to make more accurate and related reference response and able to solve more complex problems. In order to further tackle this problem, OpenAI recent release Beta Assistants API which can pre-config the Chatbot behavior and define all available function calls “Tools” and able to create different conversation called “Threads” and introducing a “Run Steps” to handle multiple stages of response to solve more complex problem. As a result, engineers can prevent the need to repeatedly send previous message and function schemas in every Chat API call to avoid encountering a maximum token error.

  • Hallucination: As Chatbots or LLM models are trained using public data and lack the ability to acquire specialized domain knowledge, there may be instances where customers observe the generation of inaccurate responses by GenAI. To address this issue, it is essential to incorporate a knowledge base that provides the model with continually updated background information. This will ensure that all responses from the Chatbot are grounded in factual information sourced from our domain knowledge Vector database. There are many options to solve this issue such as using VoiceFlow knowledge-base API, or using Pinecone Vector DB (but you need to use a model to embed your text) or OpenAI assistants API file upload (Beta version as ChatGPT file based knowledge base), even build your own with tradition database like PostgreSQL

  • Guided or unguided Conversation: When initially constructing the Chatbot solution or utilizing the LLM for implementing intricate business logic, engineers may encounter challenges in directing users to the next step or obtaining multiple-step information from them. This is where tools such as Voiceflow and AWS Lex come into play, simplifying the process, allowing user to design a conversional steps using a user-friendly UI. However, utilizing the OpenAI ChatGPT model, which has been improved to comprehend background function schemas, presents a more streamlined approach for implementing complex solutions. For instance, by combining ChatGPT with a function schema for retrieving weather data via an API, with inputs of city name and measurement unit, the model is capable of intelligently guiding the conversation. In contrast, using Voiceflow would require defining two separate steps for obtaining the city name and measurement unit. Additionally, if there are any changes to the API, the steps in Voiceflow would need to be modified accordingly. To ensure long-term viability, a GPT + API calls schema, allowing the generalized AI to determine the appropriate course of action, would be the preferable choice.

Summary

The ongoing development and availability of large-scale Language Model (LLM) frameworks have significantly lowered the barriers to entry for incorporating advanced AI capabilities into modern applications. These models can be easily accessed and utilized via straightforward RESTful API interfaces, ushering in a particularly opportune moment for the surge of generative AI (GenAI)-powered applications.

Leveraging the robust infrastructure provided by cloud-hosting services along with the efficiency of contemporary application frameworks, the deployment cycle for AI-integrated applications is now remarkably accelerated. This confluence of technological advancements marks an ideal juncture for developers and organizations to embark on the creation of their own AI-driven solutions. With the tools and services at hand, the delivery of sophisticated, AI-enhanced applications can be achieved with unprecedented speed and agility.

Previous
Previous

Flutter @ amaysim - Challenges and Learnings

Next
Next

Flutter @ amaysim - Journey & Wins