#007: Document Action Model (DAM) - An intelligent GxP document generation agent
Authors: Param Nagda & Tanay Agrawal, Project Jupiter Team Members and AI/ML Engineers @ xLM ContinuousLabs
Project Jupiter @ xLM Continuous Labs
Goal: Develop a GenAI Framework that is agent based with a Chat Interface that can be used to automate generation of complex documents used in GxP workflows.
Use Case: An example use case is development of a Risk Assessment document for a medical device manufacturing. Any change in the manufacturing environment starts with a CR (Change Request) and a Risk Assessment (RA). Typically, such an RA is a complex document that includes assessing risks to product and process, evaluating the impact of change on existing risks/hazards as well as introduction of new risks/hazards, listing various related references (Product / Process Risk Analysis, etc..) and populating the required approvers based on the scope of change.
Such an RA induces migraines for the change owner and multiple trips to various document repositories and endless Teams chats with colleagues and QA folks.
Outcome: A Jupiter sized GenAI Agent that can handle the generation of a complex document like an RA through NLP (Natural Language Processing) chat interface.
Imagine a Change Owner inputs the Change Description and the Agent Jupiter will generate the document with a few interactions via chat. The output is a DRAFT RA in Word format using the latest template. Voila....what typically takes hours is reduced to minutes.
How does our Jupiter team make this magic happen?
#1 LLM Model - The Foundation
Llama 2 is a groundbreaking family of open-access large language models (LLMs) released by Meta in 2023. Available in different sizes up to 70 billion parameters, Llama 2 models have been shown to perform at a high level, rivaling the capabilities of closed-source giants like ChatGPT and PaLM.
What sets Llama 2 apart is its commitment to safety and transparency - the models demonstrate lower rates of harmful outputs and "hallucination" compared to competitors, and Meta has made the model weights and starting code freely available under a permissive license, lowering barriers to access and use.
Large Language Models (LLMs) such as Llama 2 have revolutionized chatbots by enabling them to understand and generate natural, human-like text. This advancement allows chatbots to process and respond to a wide variety of conversational inputs, maintain context over interactions, and adapt responses according to the user's style and preferences.
Moreover, LLMs enhance chatbots with the ability to manage complex inquiries and engage in more personalized interactions, thanks to their training on diverse datasets and multilingual capabilities.
However, integrating LLMs into chatbots also presents significant challenges, including high operational costs, potential biases from training data, and privacy concerns.
Ensuring ethical deployment and handling of user data responsibly are paramount to mitigate risks like misinformation and data misuse. As the technology evolves, future developments in LLMs are expected to focus on improving contextual understanding, reducing biases, and enhancing user privacy and security, further refining the user experience with chatbots.
Due to the insufficient data Llama 2 requires, our focus is on providing concise and specific information necessary for training the model to achieve acceptable accuracy and precise file selection.
#2 Our Full Stack Model
The era of MERN stack:
MERN stack, which is an abbreviation for MongoDB, Express.js, React.js and Node.js has gained immense popularity in the last few years as the stack provides Single Language Advantage where it only uses JavaScript as the primary language. It also provides improved performance and scalability compared to traditional server-side technologies like PHP, along with flexibility and adaptability with technologies like MongoDB and React.
Due to these following reasons MERN became extremely popular with developers of various levels across the globe. With JavaScript as its bible the philosophy of single language code began to gain popularity.
Birth of a New Stack:
FARM (FastAPI, React, MongoDB) stack (the new magic bullet) allows developers to make the process of integrating models with backend operations more seamless and easier than before.
With this our backend is powered with the capability to generating AI text, communicate like a chatbot, perform analytics and predictions along with image processing and recognition without the need of any translation layer to interpret the input and output.
FastAPI
FastAPI deserves a quick mention. FastAPI which is a backend server written in Python has gained popularity in the past few years for its capabilities of performing CRUD (acronym for create, read, update and delete) in Python itself. This allows for scaling of AI and ML models to be effectively integrated into applications.
FastAPI is praised for its comprehensive documentation and developer-friendly features, which can make it easier to work with, especially for Python developers.
FastAPI as a backend provides high performance with benchmarks showing it can handle up to 5,000 requests per second, outperforming other Python web frameworks like Flask and Django.
It is easy to implement concurrent programming using the async/await syntax, without the need to manage event loops or other asynchronous programming complexities.
MERN vs. FARM, who is the prevailing champion?
Here's a comparison of some key aspects:
Performance: FastAPI is known for its high performance and low latency. It's built on top of Starlette for the web parts and Pydantic for the data parts, leveraging Python's asynchronous capabilities for efficient I/O handling. On the other hand, Node.js in the MERN stack is single-threaded and event-driven, which can sometimes lead to performance issues if not handled properly.
Scalability: Both stacks can be horizontally scaled by deploying multiple instances of the application behind a load balancer. MongoDB is horizontally scalable by design, allowing you to distribute data across multiple nodes. However, FastAPI's async capabilities can potentially make it more efficient in handling concurrent requests compared to Express in the MERN stack.
Developer Experience: Developer experience can vary depending on individual preferences and familiarity with the technologies involved. Some developers may prefer the simplicity and flexibility of JavaScript and the Node.js ecosystem in the MERN stack, while others may appreciate the type safety and productivity benefits of Python.
Ecosystem and Community: The MERN stack has a larger and more mature ecosystem with a wide range of libraries, tools, and resources available. Node.js has a large and active community, and React is one of the most popular frontend libraries with extensive community support. FastAPI is relatively newer compared to Express, but it's gaining popularity quickly, and its ecosystem is growing rapidly.
Use Cases: Both stacks are suitable for building a wide range of web applications, including CRUD applications, real-time applications, and RESTful APIs. However, FastAPI's async capabilities may make it particularly well-suited for applications that require high concurrency and low latency, such as real-time analytics dashboards or IoT applications.
#3 Function Calling
The function calling feature in Large Language Models (LLMs) is a powerful tool that enables the model to turn unstructured text into structured data by utilizing its world knowledge.
This feature allows users to define functions with specific parameters and prompt the LLM with questions related to those functions. For example, you can define a function like "get_current_weather" with parameters such as location and unit, and then ask the LLM a question like "What is the weather like in Jacksonville?" to receive a structured response based on the defined function.
For Project Jupiter, we have used the function calling feature by integrating OpenAI’s GPT 3.5 API with LangChain. The function calling feature in LangChain involves the use of agents to enhance the capabilities of LLMs by allowing them to call predefined functions during conversations based on input instructions.
For Project Jupiter, we created two functions: -
Document Generation - This function is designed to generate documents necessary when activated by the user. Upon activation, it will begin by asking a series of questions tailored to the user's responses, prompting follow-up questions accordingly. Once all relevant inquiries have been addressed, it will automatically produce the document for the user.
Information Retrieval - This function can be called upon when the user seeks information about any system. Instead of utilizing RAG, we have created our own Python script for information retrieval.
The GPT API will answer any general-purpose questions based on the information it has been trained on.
#4 Retrieval Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is a groundbreaking approach in the realm of generative AI that integrates external knowledge sources with large language models (LLMs) to enhance response accuracy and reliability.
By connecting LLMs to authoritative knowledge bases, RAG enables AI models to provide more precise and up-to-date information to users, fostering trust through source attribution and citations.
With its information retrieval component fetching data from external sources based on user queries, RAG is transforming various industries by enabling AI models to interact with diverse knowledge bases and deliver specialized assistance tailored to specific domains.
For Project Jupiter, we have applied preprocessing techniques to extract necessary data and store it in a master CSV file. The data is then loaded using LangChain’s CSVLoader.
After loading the data, however, we encountered an issue with correlation between the data points, leading to inaccurate results due to interrelated system information.
To address this challenge, we experimented with several prompting techniques such as chain of thought and self-ask methods. Additionally, we explored different retrievers and fine-tuned their parameters along with utilizing various retrieval chain techniques provided by LangChain.
We also investigated Llama Index, but found no significant difference in results even after experimenting with multiple techniques.
As a response to these challenges, we crafted our own Python script for information retrieval tailored to this our specific use cases. When the user triggers the function call feature, it invokes the Jupiter Bot.
Subsequently, the user's query undergoes a series of NLP preprocessing steps to extract keywords. Leveraging these keywords, we retrieve all essential information and deliver it back to the user.
Conclusion
Our DAM model leverages Llama Foundational Model, FARM Stack and LangChain along with Functional Calling to bring speed, accuracy and efficiency in generating complex GxP documents.
Once our DAM model generates a document in Word, the end user can edit it and get it approved. In short, we use Human-in-the-Loop for now.
Our final goal is to introduce the concept of Model-in-the-Loop where a second model performs all the required QA functions thus achieving end to end automation with unprecedented accuracy and speed
Watch on YouTube
Listen on Spotify
What questions do you have about artificial intelligence in Life sciences? No question is too big or too small. Submit your questions about AI via this survey here.
COMMENTS