Search the Community
Showing results for tags 'gen ai'.
-
Last week, Dr. Matt Wood, VP for AI Products at Amazon Web Services (AWS), delivered the keynote at the AWS Summit Los Angeles. Matt and guest speakers shared the latest advancements in generative artificial intelligence (generative AI), developer tooling, and foundational infrastructure, showcasing how they come together to change what’s possible for builders. You can watch the full keynote on YouTube. Announcements during the LA Summit included two new Amazon Q courses as part of Amazon’s AI Ready initiative to provide free AI skills training to 2 million people globally by 2025. The courses are part of the Amazon Q learning plan. But that’s not all that happened last week. Last week’s launches Here are some launches that got my attention: LlamaIndex support for Amazon Neptune — You can now build Graph Retrieval Augmented Generation (GraphRAG) applications by combining knowledge graphs stored in Amazon Neptune and LlamaIndex, a popular open source framework for building applications with large language models (LLMs) such as those available in Amazon Bedrock. To learn more, check the LlamaIndex documentation for Amazon Neptune Graph Store. AWS CloudFormation launches a new parameter called DeletionMode for the DeleteStack API — You can use the AWS CloudFormation DeleteStack API to delete your stacks and stack resources. However, certain stack resources can prevent the DeleteStack API from successfully completing, for example, when you attempt to delete non-empty Amazon Simple Storage Service (Amazon S3) buckets. The DeleteStack API can enter into the DELETE_FAILED state in such scenarios. With this launch, you can now pass FORCE_DELETE_STACK value to the new DeletionMode parameter and delete such stacks. To learn more, check the DeleteStack API documentation. Mistral Small now available in Amazon Bedrock — The Mistral Small foundation model (FM) from Mistral AI is now generally available in Amazon Bedrock. This a fast-follow to our recent announcements of Mistral 7B and Mixtral 8x7B in March, and Mistral Large in April. Mistral Small, developed by Mistral AI, is a highly efficient large language model (LLM) optimized for high-volume, low-latency language-based tasks. To learn more, check Esra’s post. New Amazon CloudFront edge location in Cairo, Egypt — The new AWS edge location brings the full suite of benefits provided by Amazon CloudFront, a secure, highly distributed, and scalable content delivery network (CDN) that delivers static and dynamic content, APIs, and live and on-demand video with low latency and high performance. Customers in Egypt can expect up to 30 percent improvement in latency, on average, for data delivered through the new edge location. To learn more about AWS edge locations, visit CloudFront edge locations. Amazon OpenSearch Service zero-ETL integration with Amazon S3 — This Amazon OpenSearch Service integration offers a new efficient way to query operational logs in Amazon S3 data lakes, eliminating the need to switch between tools to analyze data. You can get started by installing out-of-the-box dashboards for AWS log types such as Amazon VPC Flow Logs, AWS WAF Logs, and Elastic Load Balancing (ELB). To learn more, check out the Amazon OpenSearch Service Integrations page and the Amazon OpenSearch Service Developer Guide. For a full list of AWS announcements, be sure to keep an eye on the What's New at AWS page. Other AWS news Here are some additional news items and a Twitch show that you might find interesting: Build On Generative AI — Now streaming every Thursday, 2:00 PM US PT on twitch.tv/aws, my colleagues Tiffany and Mike discuss different aspects of generative AI and invite guest speakers to demo their work. Check out show notes and the full list of episodes on community.aws. Amazon Bedrock Studio bootstrapper script — We’ve heard your feedback! To everyone who struggled setting up the required AWS Identity and Access Management (IAM) roles and permissions to get started with Amazon Bedrock Studio: You can now use the Bedrock Studio bootstrapper script to automate the creation of the permissions boundary, service role, and provisioning role. Upcoming AWS events Check your calendars and sign up for these AWS events: AWS Summits — It’s AWS Summit season! Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Dubai (May 29), Bangkok (May 30), Stockholm (June 4), Madrid (June 5), and Washington, DC (June 26–27). AWS re:Inforce — Join us for AWS re:Inforce (June 10–12) in Philadelphia, PA. AWS re:Inforce is a learning conference focused on AWS security solutions, cloud security, compliance, and identity. Connect with the AWS teams that build the security tools and meet AWS customers to learn about their security journeys. AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Midwest | Columbus (June 13), Sri Lanka (June 27), Cameroon (July 13), New Zealand (August 15), Nigeria (August 24), and New York (August 28). You can browse all upcoming in-person and virtual events. That’s all for this week. Check back next Monday for another Weekly Roundup! — Antje This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS! View the full article
-
- amazon neptune
- aws cloudformation
- (and 5 more)
-
In our previous post, we discussed how to generate Images using Stable Diffusion on AWS. In this post, we will guide you through running LLMs for text generation in your own environment with a GPU-based instance in simple steps, empowering you to create your own solutions. Text generation, a trending focus in generative AI, facilitates a broad spectrum of language tasks beyond simple question answering. These tasks include content extraction, summary generation, sentiment analysis, text enhancement (including spelling and grammar correction), code generation, and the creation of intelligent applications like chatbots and assistants. In this tutorial, we will demonstrate how to deploy two prominent large language models (LLM) on a GPU-based EC2 instance on AWS (G4dn) using Ollama, an open source tool for downloading, managing, and serving LLM models. Before getting started, ensure you have completed our technical guide for installing NVIDIA drivers with CUDA on a G4DN instance. We will utilize Llama2 and Mistral, both strong contenders in the LLM space with open source licenses suitable for this demo. While we won’t explore the technical details of these models, it is worth noting that Mistral has shown impressive results despite its relatively small size (7 billion parameters fitting into an 8GB VRAM GPU). Conversely, Llama2 provides a range of models for various tasks, all available under open source licenses, making it well-suited for this tutorial. To experiment with question-answer models similar to ChatGPT, we will utilize the fine-tuned versions optimized for chat or instruction (Mistral-instruct and Llama2-chat), as the base models are primarily designed for text completion. Let’s get started! Step 1: Installing Ollama To begin, open an SSH session to your G4DN server and verify the presence of NVIDIA drivers and CUDA by running: nvidia-smi Keep in mind that you need to have the SSH port open, the key-pair created or assigned to the machine during creation, the external IP of the machine, and software like ssh for Linux or PuTTY for Windows to connect to the server. If the drivers are not installed, refer to our technical guide on installing NVIDIA drivers with CUDA on a G4DN instance. Once you have confirmed the GPU drivers and CUDA are set up, proceed to install Ollama. You can opt for a quick installation using their binary, or choose to clone the repository for a manual installation. To install Ollama quickly, run the following command curl -fsSL https://ollama.com/install.sh | sh Step 2: Running LLMs on Ollama Let’s start with Mistral models and view the results by running: ollama run mistral This instruction will download the Mistral model (4.1GB) and serve it, providing a prompt for immediate interaction with the model. Not a bad response for a prompt written in Spanish!. Now let’s experiment with a prompt to write code: Impressive indeed. The response is not only generated rapidly, but the code also runs flawlessly, with basic error handling and explanations. (Here’s a pro tip: consider asking for code comments, docstrings, and even test functions to be incorporated into the code). Exit with the /bye command. Now, let’s enter the same prompt with Llama2. We can see that there are immediate, notable differences. This may be due to the training data it has encountered, as it defaulted to a playful and informal chat-style response. Let’s try Llama2 using the same code prompt from above: The results of this prompt are quite interesting. Following four separate tests, it was clear that the generated responses had not only broken code but also inconsistencies within the responses themselves. It appears that writing code is not one of the out-of-the-box capabilities of Llama2 in this variant (7b parameters, although there are also versions specialized in code like Code-Llama2), but results may vary. Let’s run a final test with Code-Llama, a Llama model fine-tuned to create and explain code: We will use the same prompt from above to write the code: This time, the response is improved, with the code functioning properly and a satisfactory explanation provided. You now have the option to either continue exploring directly through this interface or start developing apps using the API. Final test: A chat-like web interface We now have something ready for immediate use. However, for some added fun, let’s install a chat-like web interface to mimic the experience of ChatGPT. For this test, we are going to use ollama-ui (https://github.com/ollama-ui/ollama-ui). ⚠︎ Please note that this project is no longer being maintained and users should transition to Open WebUI, but for the sake of simplicity, we are going to still use the Ollama-ui front-end. In your terminal window, clone the ollama-ui repository by entering the following command: git clone https://github.com/ollama-ui/ollama-ui Here’s a cool trick: when you run Ollama, it creates an API endpoint on port 11434. However, Ollama-ui will run and be accessible on port 8000, thus, we’ll need to ensure both ports are securely accessible from our machine. Since we are currently running as a development service (without the security features and performance of a production web server), we will establish an SSH tunnel for both ports. This setup will enable us to access these ports exclusively from our local computer with encrypted communication (SSL). To create the tunnel for both the web-ui and the model’s API, close your current SSH session and open a new one with the following command: ssh -L 8000:localhost:8000 -L 11434:127.0.0.1:11434 -i myKeyPair.pem ubuntu@<Machine_IP> Once the tunnel is set up, navigate to the ollama-ui directory in a new terminal and run the following command: cd ollama-ui make Next, open your local browser and go to 127.0.0.1:8000 to enjoy the chat web inRunning an LLM model for text generation on Ubuntu on AWS with a GPU instanceterface! While the interface is simple, it enables dynamic model switching, supports multiple chat sessions, and facilitates interaction beyond reliance on the terminal (aside from tunneling). This offers an alternative method for testing the models and your prompts. Final thoughts Thanks to Ollama and how simple it is to install the NVIDIA drivers on a GPU-based instance, we got a very straightforward process for running LLMs for text generation in your own environment. Additionally, Ollama facilitates the creation of custom model versions and fine-tuning, which is invaluable for developing and testing LLM-based solutions. When selecting the appropriate model for your specific use case, it is crucial to evaluate their capabilities based on architectures and the data they have been trained on. Be sure to explore fine-tuned variants such as Llama2 for code, as well as specialized versions tailored for generating Python code. Lastly, for those aiming to develop production-ready applications, remember to review the model license and plan for scalability, as a single GPU server may not suffice for multiple concurrent users. You may want to explore Amazon Bedrock, which offers easy access to various versions of these models through a simple API call or Canonical MLOps, an end-to-end solution for training and running your own ML models. Quick note regarding the model size The size of the model significantly impacts the production of better results. A larger model is more capable of reproducing better content (since it has a greater capacity to “learn”). Additionally, larger models offer a larger attention window (for “understanding” the context of the question), and allow more tokens as input (your instructions) and output (the response) As an example, Llama2 offers three main model sizes regarding the parameter number: 7, 13, or 70 billion parameters. The first model requires a GPU with a minimum of 8GB of GPU RAM, whereas the second requires a minimum of 16GB of VRAM. Let me share a final example: I will request the 7B parameters version of Llama2 to proofread an incorrect version of this simple Spanish phrase, “¿Hola, cómo estás?”, which translates to “Hi, how are you?” in English. I conducted numerous tests, all yielding incorrect results like the one displayed in the screenshot (where “óle” is not a valid word, and it erroneously suggests it means “hello”). Now, let’s test the same example with Llama2 with 13 billion parameters: While it failed to recognize that I intended to write “hola,” this outcome is significantly better as it added accents, question marks and detected that “ola” wasn’t the right word to use (if you are curious, it means “wave”) . View the full article
-
Having frequently worked with governments around the world over the course of my career, I’ve had all kinds of discussions about the global impact of generative AI. Today, I’m publicly wading into those waters to deliver my perspective, and my opinion is that … it’s incredibly hard to predict the future. Done. Wrapped up this entire post in a single sentence. All joking aside, there’s a great deal of hype around gen AI, with predictions that it will have a huge impact on office workers, our everyday way of life, wealth disparity, the future of work, education — you name it. For my part, I believe there certainly will be impacts, but I’m reluctant to make specific predictions. However, I think certain insights can help us prepare. A general framing of where this particular innovative technology might lead us can be helpful — especially for those developing data strategies, AI capabilities and technology transformations across government. Not only do government leaders need to consider how they evolve as mission-driven organizations, but also how their outsized effect on citizens needs to account for this revolution. Formulating a revolution Let’s take a look at a macro-framework for how technology creates revolutions and then apply it to gen AI. Here is the basic formula: Infrastructure + Products = Revolution in X A revolutionary innovation requires infrastructure that makes the underlying technology readily available. Products align the innovation to answer specific value requirements or use cases. These two aspects democratize the usage and make an innovation cost efficient enough to create a revolution. Another way to describe the equation is that it takes an ecosystem of specialized products on top of an expansive infrastructure for an innovation to change the world. This is easier to see with examples from history. Here are two previous technology revolutions viewed through this framing: Electricity [electric grid] + [electric consumer products] = better form of energy transfer (vs. coal or wood) The electric grid plus electricity-based products such as lights or computers allowed for an innovative way to transfer energy to transform the world. Internet [telco networks] + [software and hardware] = better form of data transfer (vs. paper or fax) A digital telecommunications network plus data-leveraging products allowed for an innovative way to transfer data to change the world. (In this case, the early infrastructure leveraged existing telco networks.) This basic model can be applied to a number of revolutionary technologies such as the combustion engine, currency, the printing press and more. So what would AI/ML look like in this model? Infrastructure = data Products = algorithms If data is the infrastructure in our equation and algorithms the product, what then is the X factor? I think X in this equation would be a better form of functions (those that are more complex and accurate), which can be thought of as probabilistic models of reality. This isn’t something new — we’ve already modeled economies, financial trends, businesses, even golf. Physics is a mathematical model of reality. But what happens when we can do this easily and accurately with small sets of data? What happens when everyone can do this without taking graduate-level statistics and modeling? In a generation or less, dieticians could model ideal healthful diets for patients and society could model optimized learning pathways for students. On top of that, they’d be able to share individual functions and outcomes for an incredible network effect. This algorithmic thinking, at scale and across society, will launch a revolution. Where do we use humans today to essentially perform a set of complex functions? Examples of work likely to be redefined and augmented by AI include the collecting of medical diagnostics, financial advising, and more. Now think about a society in which those functions are easy to create, customize and share. There is much to unpack when we frame the AI revolution in this way, but I’ll say this: I spend a lot of time working with governments and helping them adjust their perspective to see that data is infrastructure, on top of the traditional concept of infrastructure (cloud). We strategize together on the second- and third-order implications of this perspective, such as how this data infrastructure needs to be architected not just for the products we know about today, but also for those yet to be imagined. Crawl, walk, run. Language is a reflection of ourselves Disinformation attacks are only going to get worse as we head into key elections around the world. AI can be used to generate increasingly convincing fakes. We have more bad actors leveraging disinformation than ever, and this problem will only get worse because of large language models (LLMs). While I said I was unwilling to make a prediction on the future impact of AI, I’ll wager that a malicious nation-state somewhere out there is already researching how to use LLMs to make disinformation campaigns worse. And they’re not prompting the LLM for fake news; they’re using it for what it is: a probabilistic representation of society. Let’s use GPT-4 as an example. It is a highly complex statistical model that represents the data it was trained on. Where does that data come from, you ask? It comes from the internet, textbooks, social media and many other sources. This model is fantastic at generating responses to our prompts because it so closely represents us as a society. I’m thinking of a quote from one of my favorite novels, Babel by fantasy writer R.F. Kuang: “Languages aren’t just made of words. They’re modes of looking at the world. They’re the keys to civilization.” Because they are based on language, LLMs are also “modes of looking at the world.” There is a good amount of research in this area. We’ve seen researchers use LLMs in economics to simulate many individuals and the decisions they’d make. Others have used them to predict partisan reactions to various politically charged responses. One researcher fed an LLM a specific media diet to predict public opinion. I talked earlier about the democratization of these functions, but let’s dive into the implications of what a complex function means in reality. An LLM trained on the data of a society represents a view of that society. What can that view be used for? What can it tell us about ourselves that we don’t know? Opportunities and threats When we think about LLMs, it shouldn’t be all doom and gloom. A strengths, weaknesses, opportunities and threats (SWOT) analysis rightfully places opportunities and threats together because they coexist. There’s a huge potential for LLMs to have a positive impact on the world. This simulation function means governments can pre-test domestic policies for societal impacts before they are implemented. New laws and government programs can be tested for unknown negative externalities. Our own intelligence agencies can use these models to help keep us safe. GPT-4 cost $100 million to train. Would the U.S. intelligence community be willing to pay $100 million to have an accurate model of another country’s decision-making processes? How about a set of functions that model key nation-states and how they interact? As gen AI models become more ubiquitous, we also face the distinct risk of regression to the mean. This means extended AI usage gravitates around the averages in our models. So society ends up producing similar tasting products, similar sounding songs, similar style books and movies, and so on. Commercialism already drives some of this, but LLMs could accelerate regression to the mean. What we lose are the happy accidents and serendipity. We could lose the benefits of diversity. This is something that policy-makers across government should seriously consider. Hopefully, the incredible insights that LLMs bring help us better understand each other. Despite the many risks, I believe we’ll find we’re much more alike than different, and there are many paths to cooperation across governments in the global community. Moving beyond LLMs Gen AI has captured the imagination of people everywhere with its very human-like outputs of conversations, writing, music, images and more. These interactions with AI can feel amazingly natural and authentic, and occasionally surprising in delightful or humorous ways. However, gen AI is not only for human interaction — other models can be used effectively for analytic and business applications that differ from LLMs. Let’s dig into some examples, all explained at an executive level, and how businesses might deploy these. To understand how these gen AI models work, we need to understand how a generative algorithm works. The simplest explanation is that we enter a prompt, which is converted into a set of numbers (a “numeric input”), and that is entered into the function. The function then generates the output. It’s not that unlike sixth grade algebra, when we took a function and plugged in x to calculate y. The key difference is that in order to get the y output to be as detailed as a generated image, the function and inputs x must be extremely complex. Understanding GANs and VAEs But how does the algorithm know how to convert our input into something we understand? This is where we get into how specific models are trained. Let’s look at two generative models called generative adversarial networks (GANs) and variational autoencoders (VAEs). GANs work by making two models (neural networks) compete against each other, which is why it is called “adversarial.” The first model’s job is to generate an output that looks like real data. The second model (called a discriminator) tries to discern fake data from real data. The second model gets inputs of both real data and the fake (generated) data of the first model. Both models continue to train and get better at their job until the discriminator cannot tell fake from real data. At this point, your first model is trained to output very realistic data and can be used for generative AI. VAEs also have two models but they do different things. The first model takes a lot of data and converts it into a simplified set of numbers (we call this encoding, which is where the “autoencoder” term comes from). Those numbers are then organized. The second model takes those simplified numbers and tries to generate the original data, or as close to it as possible. It’s sort of like dehydrating food and then reconstituting it — the goal is for the second model to reconstruct the first as closely as possible. When the second model gets really good at this, the training is completed. It becomes a generator. The trick is the simplified numbers in the middle of this training process were organized in a logical manner. The result of that organization means our inputs now generate logical outputs in the same way the original data was organized. Using AI insights to solve real-world problems Let’s look at this in practice. I had some fun building a GAN for profiles of whiskey. I scraped the web for various whiskey reviews, converted those into tasting profiles, and then trained the GAN on that data. Eventually, I could ask the generative model to output a 1,000 unique whiskey profiles a master distiller might realistically create. So what did I do with that generated data? I analyzed it and used the insights to help my own home aging techniques. It’s like having a huge survey of master distiller’s advice on what profiles to develop. Let’s apply this to problems faced by governments globally. Here are some questions that, with the right data and training, these models could help answer: For banking, financial regulatory, and AML oversight: What might new forms of money laundering look like? Can we generate 10K synthetic financial statements that show us the risk of money laundering? What could we learn from that? For military and transportation departments: What different logistics plans solve our needs but in unique ways? If we looked at a large sample of logistic routes that all met our mission, would we see trade-offs between decisions we never noticed before? For central banks: What fiscal policies might help to reduce bank failures given our plans to change interest rate targets? If we could run a distribution of simulated bank outcomes to a monetary change, would we discover unforeseen effects and risks? For counterintelligence: What unknown patterns of behavior might indicate intelligence gathering? Could we identify collection methods not in use or unknown to us? Could we identify sources we didn’t realize existed? AI is out of the barn There is a whole world of generative options beyond LLMs. In this post we looked at a macro-framework to prepare us for the coming AI revolution, unpacked the depths of what an LLM can offer, and explored other generative AI models. I’d like to share a final example of a policy point that affects global governments and those they regulate. We’re moving to a point in time when all decisions will require consulting an AI. Not because the AI will be right all the time, but because it will soon be irresponsible not to have weighed an AI input, a relevant statistical model of reality, during the decision-making process. There is no going back on innovation. In fact, ChatGPT gave me 14 idioms that convey this exact idea, including one I hadn’t heard before but which makes perfect sense: “The horse is out of the barn.” The post Predicting the Generative AI Revolution Requires Learning From Our Past appeared first on Snowflake. View the full article
-
Snowflake is committed to helping our customers unlock the power of artificial intelligence (AI) to drive better decisions, improve productivity and reach more customers using all types of data. Large Language Models (LLMs) are a critical component of generative AI applications, and multimodal models are an exciting category that allows users to go beyond text and incorporate images and video into their prompts to get a better understanding of the context and meaning of the data. Today we are excited to announce we’re furthering our partnership with Reka to support its suite of highly capable multimodal models in Snowflake Cortex. This includes Flash, an optimized model for everyday questions and developing support for Core, Reka’s largest and most performant model. This will allow our customers to seamlessly unlock value from more types of data with the power of multimodal AI in the same environment where their data lives, protected by the built-in security and governance of the Snowflake Data Cloud. Reka’s latest testing reveals that both Flash and Core are highly capable with Core’s capabilities approaching GPT-4 and Gemini Ultra, making it one of the most capable LLMs available today. In addition to expanding our partnership with NVIDIA to power gen AI applications and enhance model performance and scalability, our partnership with Reka and other LLM providers are the latest examples of how Snowflake is accelerating our AI capabilities for customers. Snowflake remains steadfast in our commitment to make AI secure, easy to use and quick-to-implement, for both business and technical users. Taken together, our partnerships and investments in AI ensure we continue to provide customers with maximum choice around the tools and technologies they need to build powerful AI applications. The post Snowflake Brings Gen AI to Images, Video and More With Multimodal Language Models from Reka in Snowflake Cortex appeared first on Snowflake. View the full article
-
Gen AI/LLMs have both long-term benefits and risks. But what will that look like? It’s no surprise that ever since ChatGPT’s broader predictive capabilities were made available to the public in November 2022, the sprawl of stakeholder capitalization on large language models (LLMs) has permeated nearly every sector of modern industry, accompanied or exacerbated by collective fascination. Financial services is no exception. But what might this transformation look like, from practical applications to potential risks? In this blog post, we’ll walk through what the actual capabilities of LLMs are, and the steps financial organizations should take to harness this technology effectively and safely. How do Gen AI/LLMs “democratize” access to insights? Compared to traditional AI models, generative AI/LLMs provide a significant uplift in tackling the wealth of unstructured data that make up things such as loan agreements, claims agreements, underwriting documents and the like. LLMs stand out in the following capabilities: Content synthesis: Generative AI models can process huge amounts of multimodal information (text, images, video, audio, etc.) and synthesize content in a short period of time and to a very reasonable accuracy. Information extraction: Text generators like ChatGPT are efficient information retrievers capable of generating responses to specific human input, such as questions or requests. The main difference between this kind of extraction and that of a simple computer output is that there’s lexical fluidity in the human-computer interaction. Content generation: Image generators like DALL-E, having gained much public traction and direct usage, teach us that AI models can imitate human paintings, music videos or even phonetics. The ability to translate content from one language to another, be it a human or systems language, represents a huge benefit to all industries, including financial services. This is all to say that generative AI and LLMs are significantly accretive from a usage perspective. They greatly reduce complexity for both technical and non-technical audiences, whether that means automating certain financial processes, for example, or generating a layman-readable summary of technical documents. Text generators like ChatGPT (and other AI/ML systems) fulfill the public desire for this kind of accessibility and ease of use — even for a model so technically complex, anyone can use it. Democratization also carries risk, though. Potential pitfalls and why governance matters The financial services sector, being a heavily regulated sector, has a framework and structure for addressing AI governance and model validation in most industry organizations. However, in most cases, these frameworks will need to be assessed and upgraded in light of the new risks amplified by generative AI (such as hallucinations and intellectual property rights exposure) as well as the evolving regulatory landscape. Regulatory bodies worldwide have issued guidelines around the use of AI/ML models, with the level of prescriptive guidance in these regulations varying by region/country. For example, the Biden administration recently issued an executive order (the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence) that highlights the importance of fairness, testing, security and privacy in the development and usage of AI and ML models. So what are the benefits? Nevertheless, there are enduring advantages to adopting generative AI and LLMs in financial services. Its novelty will likely mature into prevalence as computing resources proliferate and the costs of adoption decrease. As to how pervasive generative AI might be in the infrastructure of any given organization, we can only speculate. It’s clear that, in the short term, curiosity prevails and most financial services firms are experimenting with the technology for several relevant use cases in their given organization. AI deployment in applications has already been prevalent for years, but generative AI expands that domain by automating routine functions of information review, parsing and synthesis — especially with unstructured data. And, of course, querying data by simply prompting a chatbot with instructions or questions is already possible, which means customer assistance with AI virtual assistants is right around the corner. Investment firms also benefit from more proficient generative AI data queries to extract insights about macroeconomic states, regulatory or company filings or more. In the long term, generative AI will be highly cost effective and will drive operations cost reductions through automation efficiencies. While caution and affordability concerns relegate AI use cases to simple and almost “secondary” assistive roles, the ultimate consensus is that this short-term assistance will transition to more involved, long-term automation and embedded AI in the very core of every business process. The transition, though, is not as simple as it sounds. How do we transition to generative AI? For some, the constant discourse around the AI paradigm shift seems to be more than just the usual noise. I think it is clear that AI — particularly LLMs — are here to stay. If anything, organizations are going to make it an integral part of every business process. But accompanying this question of how we transition is the more pressing concern of whether we’re even ready for that transition, especially given how nascent generative AI is and how unprimed many data strategies are for this superposition. An AI model is only as good as the data that you put in. Investing in a strong data infrastructure addresses the preliminary bugs and holes of siloed data and fragmentation. Implementing LLMs on top of a structure already plagued by incompatible data architectures — or perhaps inadequate talent onboarding to even support that implementation — will only create more problems. Not to mention that a lack of robust governance frameworks for privacy compliance and potential AI-created hallucinations will only inflate ethical and security concerns. The smoothness of the transition, therefore, depends on maturity or, as AI research Ashok Goel describes it, “muscle-building.” The noise of generative AI’s “disruption” is not as loud or as dramatic as many believe. It isn’t a sudden rupturing of the ecosystem or a rapid scramble to immediately transform whole organizations. Instead, generative AI and LLMs will shift business processes for financial services, but slowly and only if organizations first optimize their data strategies and infrastructures. To learn more about LLMs in financial services and how Snowflake can help, check out the full interview on DCN’s channel. The post How Financial Services Should Prepare for Generative AI appeared first on Snowflake. View the full article
-
Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Working with large language models (LLMs) for enterprise use cases requires the implementation of quality and privacy considerations to drive responsible AI. However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. The need for an end-to-end strategy for data management and data governance at every step of the journey—from ingesting, storing, and querying data to analyzing, visualizing, and running artificial intelligence (AI) and machine learning (ML) models—continues to be of paramount importance for enterprises. In this post, we discuss the data governance needs of generative AI application data pipelines, a critical building block to govern data used by LLMs to improve the accuracy and relevance of their responses to user prompts in a safe, secure, and transparent manner. Enterprises are doing this by using proprietary data with approaches like Retrieval Augmented Generation (RAG), fine-tuning, and continued pre-training with foundation models. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus. First, many LLM use cases rely on enterprise knowledge that needs to be drawn from unstructured data such as documents, transcripts, and images, in addition to structured data from data warehouses. Unstructured data is typically stored across siloed systems in varying formats, and generally not managed or governed with the same level of rigor as structured data. Second, generative AI applications introduce a higher number of data interactions than conventional applications, which requires that the data security, privacy, and access control policies be implemented as part of the generative AI user workflows. In this post, we cover data governance for building generative AI applications on AWS with a lens on structured and unstructured enterprise knowledge sources, and the role of data governance during the user request-response workflows. Use case overview Let’s explore an example of a customer support AI assistant. The following figure shows the typical conversational workflow that is initiated with a user prompt. The workflow includes the following key data governance steps: Prompt user access control and security policies. Access policies to extract permissions based on relevant data and filter out results based on the prompt user role and permissions. Enforce data privacy policies such as personally identifiable information (PII) redactions. Enforce fine-grained access control. Grant the user role permissions for sensitive information and compliance policies. To provide a response that includes the enterprise context, each user prompt needs to be augmented with a combination of insights from structured data from the data warehouse and unstructured data from the enterprise data lake. On the backend, the batch data engineering processes refreshing the enterprise data lake need to expand to ingest, transform, and manage unstructured data. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction). Finally, access control policies also need to be extended to the unstructured data objects and to vector data stores. Let’s look at how data governance can be applied to the enterprise knowledge source data pipelines and the user request-response workflows. Enterprise knowledge: Data management The following figure summarizes data governance considerations for data pipelines and the workflow for applying data governance. In the above figure, the data engineering pipelines include the following data governance steps: Create and update a catalog through data evolution. Implement data privacy policies. Implement data quality by data type and source. Link structured and unstructured datasets. Implement unified fine-grained access controls for structured and unstructured datasets. Let’s look at some of the key changes in the data pipelines namely, data cataloging, data quality, and vector embedding security in more detail. Data discoverability Unlike structured data, which is managed in well-defined rows and columns, unstructured data is stored as objects. For users to be able to discover and comprehend the data, the first step is to build a comprehensive catalog using the metadata that is generated and captured in the source systems. This starts with the objects (such as documents and transcript files) being ingested from the relevant source systems into the raw zone in the data lake in Amazon Simple Storage Service (Amazon S3) in their respective native formats (as illustrated in the preceding figure). From here, object metadata (such as file owner, creation date, and confidentiality level) is extracted and queried using Amazon S3 capabilities. Metadata can vary by data source, and it’s important to examine the fields and, where required, derive the necessary fields to complete all the necessary metadata. For instance, if an attribute like content confidentiality is not tagged at a document level in the source application, this may need to be derived as part of the metadata extraction process and added as an attribute in the data catalog. The ingestion process needs to capture object updates (changes, deletions) in addition to new objects on an ongoing basis. For detailed implementation guidance, refer to Unstructured data management and governance using AWS AI/ML and analytics services. To further simplify the discovery and introspection between business glossaries and technical data catalogs, you can use Amazon DataZone for business users to discover and share data stored across data silos. Data privacy Enterprise knowledge sources often contain PII and other sensitive data (such as addresses and Social Security numbers). Based on your data privacy policies, these elements need to be treated (masked, tokenized, or redacted) from the sources before they can be used for downstream use cases. From the raw zone in Amazon S3, the objects need to be processed before they can be consumed by downstream generative AI models. A key requirement here is PII identification and redaction, which you can implement with Amazon Comprehend. It’s important to remember that it will not always be feasible to strip away all the sensitive data without impacting the context of the data. Semantic context is one of the key factors that drive the accuracy and relevance of generative AI model outputs, and it’s critical to work backward from the use case and strike the necessary balance between privacy controls and model performance. Data enrichment In addition, additional metadata may need to be extracted from the objects. Amazon Comprehend provides capabilities for entity recognition (for example, identifying domain-specific data like policy numbers and claim numbers) and custom classification (for example, categorizing a customer care chat transcript based on the issue description). Furthermore, you may need to combine the unstructured and structured data to create a holistic picture of key entities, like customers. For example, in an airline loyalty scenario, there would be significant value in linking unstructured data capture of customer interactions (such as customer chat transcripts and customer reviews) with structured data signals (such as ticket purchases and miles redemption) to create a more complete customer profile that can then enable the delivery of better and more relevant trip recommendations. AWS Entity Resolution is an ML service that helps in matching and linking records. This service helps link related sets of information to create deeper, more connected data about key entities like customers, products, and so on, which can further improve the quality and relevance of LLM outputs. This is available in the transformed zone in Amazon S3 and is ready to be consumed downstream for vector stores, fine-tuning, or training of LLMs. After these transformations, data can be made available in the curated zone in Amazon S3. Data quality A critical factor to realizing the full potential of generative AI is dependent on the quality of the data that is used to train the models as well as the data that is used to augment and enhance the model response to a user input. Understanding the models and their outcomes in the context of accuracy, bias, and reliability is directly proportional to the quality of data used to build and train the models. Amazon SageMaker Model Monitor provides a proactive detection of deviations in model data quality drift and model quality metrics drift. It also monitors bias drift in your model’s predictions and feature attribution. For more details, refer to Monitoring in-production ML models at large scale using Amazon SageMaker Model Monitor. Detecting bias in your model is a fundamental building block to responsible AI, and Amazon SageMaker Clarify helps detect potential bias that can produce a negative or a less accurate result. To learn more, see Learn how Amazon SageMaker Clarify helps detect bias. A newer area of focus in generative AI is the use and quality of data in prompts from enterprise and proprietary data stores. An emerging best practice to consider here is shift-left, which puts a strong emphasis on early and proactive quality assurance mechanisms. In the context of data pipelines designed to process data for generative AI applications, this implies identifying and resolving data quality issues earlier upstream to mitigate the potential impact of data quality issues later. AWS Glue Data Quality not only measures and monitors the quality of your data at rest in your data lakes, data warehouses, and transactional databases, but also allows early detection and correction of quality issues for your extract, transform, and load (ETL) pipelines to ensure your data meets the quality standards before it is consumed. For more details, refer to Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog. Vector store governance Embeddings in vector databases elevate the intelligence and capabilities of generative AI applications by enabling features such as semantic search and reducing hallucinations. Embeddings typically contain private and sensitive data, and encrypting the data is a recommended step in the user input workflow. Amazon OpenSearch Serverless stores and searches your vector embeddings, and encrypts your data at rest with AWS Key Management Service (AWS KMS). For more details, see Introducing the vector engine for Amazon OpenSearch Serverless, now in preview. Similarly, additional vector engine options on AWS, including Amazon Kendra and Amazon Aurora, encrypt your data at rest with AWS KMS. For more information, refer to Encryption at rest and Protecting data using encryption. As embeddings are generated and stored in a vector store, controlling access to the data with role-based access control (RBAC) becomes a key requirement to maintaining overall security. Amazon OpenSearch Service provides fine-grained access controls (FGAC) features with AWS Identity and Access Management (IAM) rules that can be associated with Amazon Cognito users. Corresponding user access control mechanisms are also provided by OpenSearch Serverless, Amazon Kendra, and Aurora. To learn more, refer to Data access control for Amazon OpenSearch Serverless, Controlling user access to documents with tokens, and Identity and access management for Amazon Aurora, respectively. User request-response workflows Controls in the data governance plane need to be integrated into the generative AI application as part of the overall solution deployment to ensure compliance with data security (based on role-based access controls) and data privacy (based on role-based access to sensitive data) policies. The following figure illustrates the workflow for applying data governance. The workflow includes the following key data governance steps: Provide a valid input prompt for alignment with compliance policies (for example, bias and toxicity). Generate a query by mapping prompt keywords with the data catalog. Apply FGAC policies based on user role. Apply RBAC policies based on user role. Apply data and content redaction to the response based on user role permissions and compliance policies. As part of the prompt cycle, the user prompt must be parsed and keywords extracted to ensure alignment with compliance policies using a service like Amazon Comprehend (see New for Amazon Comprehend – Toxicity Detection) or Guardrails for Amazon Bedrock (preview). When that is validated, if the prompt requires structured data to be extracted, the keywords can be used against the data catalog (business or technical) to extract the relevant data tables and fields and construct a query from the data warehouse. The user permissions are evaluated using AWS Lake Formation to filter the relevant data. In the case of unstructured data, the search results are restricted based on the user permission policies implemented in the vector store. As a final step, the output response from the LLM needs to be evaluated against user permissions (to ensure data privacy and security) and compliance with safety (for example, bias and toxicity guidelines). Although this process is specific to a RAG implementation and is applicable to other LLM implementation strategies, there are additional controls: Prompt engineering – Access to the prompt templates to invoke need to be restricted based on access controls augmented by business logic. Fine-tuning models and training foundation models – In cases where objects from the curated zone in Amazon S3 are used as training data for fine-tuning the foundation models, the permissions policies need to be configured with Amazon S3 identity and access management at the bucket or object level based on the requirements. Summary Data governance is critical to enabling organizations to build enterprise generative AI applications. As enterprise use cases continue to evolve, there will be a need to expand the data infrastructure to govern and manage new, diverse, unstructured datasets to ensure alignment with privacy, security, and quality policies. These policies need to be implemented and managed as part of data ingestion, storage, and management of the enterprise knowledge base along with the user interaction workflows. This makes sure that the generative AI applications not only minimize the risk of sharing inaccurate or wrong information, but also protect from bias and toxicity that can lead to harmful or libelous outcomes. To learn more about data governance on AWS, see What is Data Governance? In subsequent posts, we will provide implementation guidance on how to expand the governance of the data infrastructure to support generative AI use cases. About the Authors Krishna Rupanagunta leads a team of Data and AI Specialists at AWS. He and his team work with customers to help them innovate faster and make better decisions using Data, Analytics, and AI/ML. He can be reached via LinkedIn. Imtiaz (Taz) Sayed is the WW Tech Leader for Analytics at AWS. He enjoys engaging with the community on all things data and analytics. He can be reached via LinkedIn. Raghvender Arni (Arni) leads the Customer Acceleration Team (CAT) within AWS Industries. The CAT is a global cross-functional team of customer facing cloud architects, software engineers, data scientists, and AI/ML experts and designers that drives innovation via advanced prototyping, and drives cloud operational excellence via specialized technical expertise. View the full article
-
free courses Datacamp Free Access Week
James posted a topic in Databases, Data Engineering & Data Science
Access all of Datacamp's 460+ data and AI courses, career tracks & certifications ... https://www.datacamp.com/freeweek-
- datacamp
- data engineering
- (and 9 more)
-
Last year, we published the Big Book of MLOps, outlining guiding principles, design considerations, and reference architectures for Machine Learning Operations (MLOps). Since then, Databricks has added key features simplifying MLOps, and Generative AI has brought new requirements to MLOps platforms and processes. We are excited to announce a new version of the Big Book of MLOps covering these product updates and Generative AI requirements. This blog post highlights key updates in the eBook, which can be downloaded here ... View the full article
-
Forum Statistics
63.6k
Total Topics61.7k
Total Posts