Stop losing leads in the "message pile." Our Multimodal AI WhatsApp Chatbot is a sophisticated virtual assistant that does more than just chat—it understands your business context. It listens to voice notes, analyzes complex PDF specifications, and "sees" screenshots of charts or invoices to provide instant, expert-level responses. With integrated long-term memory, the bot recognizes returning customers, recalls past preferences, and maintains a seamless conversation flow 24/7. Reduce your team’s workload by 80% and respond to every inquiry in under 10 seconds, ensuring you never miss a conversion during peak market activity.
What is a "multimodal" AI WhatsApp Chatbot? Have you ever noticed how many messages your customers send you first thing in the morning? Out of 47 messages, one is asking for a price; another customer sent you a 3-minute voice note and yet another customer sent you a 20-page PDF technical spec. If you can make it to lunchtime having replied to 1/3 of those customer queries, the other 2/3 are probably lost in the message pile.
If this sounds familiar to you, read on. We are going to discuss an AI-powered WhatsApp Chatbot that will solve this issue forever!
A WhatsApp Chatbot is essentially an automated programme that lives inside your WhatsApp messenger app and answers your customers on your behalf. The key distinction is that this bot supports multiple modalities; it can interpret text, voice messages, images and PDF documents. Because your chatbot remembers the history of correspondence between you and your customers, you won't have to start from scratch with each new conversation.
In order to operate, the chatbot connects to WhatsApp's Business API, captures incoming messages and parses the information from the messages using a Language Model, like GPT-4 or our proprietary Crypto Chatbot Model ASCN, analyses the request in relation to the historical correspondence and formulates a response before delivering it to the customer. This entire process is accomplished in a matter of seconds—faster than you can switch between tabs on your web browser.
Here are a couple of practical use cases:
In the area of Crypto-Speculation and Trading: For example, if you were to receive a screenshot of a chart as a message from a customer. By analyzing the token's data including volume, numbers of holders and news from Telegram, the bot provides a short summary which might consist of an increase in values of more than 12% in 24 hours, with large investors becoming active and discussions taking place regarding listings on Coinbase. When asked about the risks associated with Solana an hour later, the bot recalls the conversation and breaks down the volatility and liquidity of Solana.
Retail and E-commerce: When a client sends an image of a product and asks if that product is on hand, the bot identifies the product, responds by checking the warehouse for inventory and providing a quote and the ability for the client to purchase it through the chat. As an example of how memory can be utilized, should the same client return a week later and ask about the same product, the bot would not only respond with the information but could also assist in any questions the client may have.
B2B and Consulting: In this instance the bot identifies a PDF document that contains detailed instructions for a 20-page work specification. The bot identifies the keywords of the document, identifies the size of the work specifications and asks clarifying questions before preparing a preliminary proposal to the client. The bot retains all of the information within memory, enabling the managers to retain the history of the company and easily access full views of the past.
Education and Online Course Providers: A student sends a voice message indicating they were not able to fully comprehend the topic of blockchain consensus from their course. The bot transcribes the voice message, clarifies the topic of blockchain consensus with clearer, more basic information and provides links to other educational materials related to this specific topic. Should that student ask for assistance in the same area the following day, the bot has the capability to identify the previous conversation and respond accordingly. The beauty of this type of multimodal interaction is that the client can communicate in whatever way makes sense for their situation, i.e., text, voice, or document, and there is never any confusion about what has been shared.
The following list outlines some of the benefits of having a chatbot with memory:
The bot has human-like characteristics when communicating. The chatbot does not simply provide a reply; it maintains information related to the client, including their name, their past interactions, their preferences, etc. This allows the chatbot to engage with a returning client as if they were still in touch: "Hello Ivan; last time we spoke you were interested in an automated reporting solution. Is it still a possibility?"
According to a recent Salesforce report, 73% of clients want the experience of interacting with a personal touch. This is exactly what a chatbot equipped with memory provides: the client experiences the chat as if it was a continuation of their relationship rather than a new meeting.
Faster and more efficient responses. The percentage of auto-responses has decreased response time by 92%, while the number of conversions has increased by 40%. For instance, in an example we referenced above, the time required to receive a first response in an ASCN.AI supported crypto company fell from 4 hours to 8 seconds and conversion rates increased by 41%. Therefore, when a chatbot has memory, the client receives their response exactly when they want it, and therefore before they have lost interest.
An example of this is as follows. A client writes in the evening about requesting a consultation on crypto arbitrage, and without a chatbot, the client would receive a reply in the morning, which might lead to the client seeking services from a competitor. However, if the client uses the chatbot, within 10 seconds of their writing to the chatbot, they would be able to access and read the response to their request with a detailed list of available strategies, a link to an online calculator, and an invitation to schedule a consultation call with the chatbot.
On October 11, 2024, during the flash crash, the bot had over 2,400 requests that it fulfilled within one evening. It would be required 15 staff; however, the bot could provide a customized response to each client regarding his/her portfolio and level of expertise.
The manager's team had 80 percent less time to deal with standard customer service questions (e.g. "How do I pay?" "Where is my order?") because of the bot. Therefore, managers could concentrate on more complex cases that require a human being's expertise, time, and attention.
At one crypto agency, when the bot was not used, three managers spent 60 percent of their time answering standard questions using template answers. However, after deploying the bot, that percentage dropped to 12. Thus, the freed-up managers could spend their time aiding larger clients and developing new products. During the quarter, revenue increased by 28 percent without additional staffing.
The bot works 24/7, does not tire, does not get ill, and is located wherever the client is located, whether it is in Asia, Europe, or Latin America. It is necessary for international projects to have around-the-clock support, which is generally far too costly and complicated.
Based on our experience, about 40 percent of conversions for crypto products occur outside of standard business hours when market activity is high, and decisions are made irrationally. Without a rapid reply, money disappears; however, with the bot's assistance, the market is always closely monitored.
Text. The bot processes messages using the original format quickly using the language model. The message will be analyzed for its intent and context before generating a reply. If you send a message that is less than 500 characters, the bot will take 2 to 4 seconds to process it. For example, if someone asked "What is a funding rate and how do you earn from it?", the bot would reply with a complete explanation of what a funding rate is, how it works, an example using 1x leverage, what risks exist, and provide a link to a funding rate calculator. If the same person sent another message 1 minute later asking "Which exchanges are best to use?", the bot would already have context and could answer without repeating any of the previous questions.
Voice Messages. The bot accepts audio messages and transcribes them. Audio messages can be sent and then processed into text by either the Whisper API or another service. The average time for transcription is 10 seconds of speech takes 1 second to transcribe, plus a little extra time for providing an answer. A good example of this is someone who is in a car and sends a voice message about their BTC, ETH and altcoin portfolios and asks what to do if the market drops. The bot will transcribe the voice message, check the user's portfolio and provide them with three options: hold, partially take profit, or hedge with a leveraged short position. The answer will be formatted in a structured manner including values for each of the scenarios. The bot is also capable of detecting user emotion and tone; if a user sounds anxious or sends in a series of questions, that user is marked as a priority and may be escalated to a live manager. The bot is capable of instantly translating written documents as well as visually interpreting screenshots of technical analysis (TA) charts, documents (including technical reports), and infographics (including pie charts, etc.) using GPT-4's multimodal capabilities. As an example, let's say a trader wants to send a screenshot image of their TradingView TA chart for the cryptocurrency pairs (CryptoC), so they take a screenshot of their TradingView TA chart for the cryptocurrency pair (CryptoC) showing key price levels of support. Once the bot receives the screenshot, it will immediately be able to recognize that the token being analyzed is (CryptoC), retrieve the relevant technical indicators (e.g. RSI, volume), and pull on-chain data and news articles associated with (CryptoC) in order to create a possible future price projection for (CryptoC)—in this case, stating that (CryptoC) currently has strong support at $142, volume is decreasing, and larger institutional players are accumulating (CryptoC). Thus, the probability of an upward price movement for (CryptoC) is estimated to be approximately 65%, assuming that the price does not break the established support level at $138. Additionally, if a trader has a photograph of a paper invoice for a consistent customer (e.g. a business owner), they can send the image to the bot via an email or a text message and request that the bot convert the invoice into a structured table that shows the invoice number, invoice date, invoice item(s), and total cost(s) in either Google Sheets or Excel.
PDF Files. The bot will download the PDF file and retrieve the text, tables and overall layout of the document, and store all of that information in its conversational memory. The customer can query the bot for more specific details about the PDF file, and the bot will be able to quickly identify and highlight the relevant sections of the PDF file. For example, if a customer sends the bot a 35-page document outlining all technical requirements for an AI agent related to crypto-analytics, the bot will highlight the primary focus areas of the project in 8 seconds—e.g., exchange integrations, on-chain data, sentiment analysis, etc.—estimate the overall amount of resources required to complete the project, ask the customer for clarification, and then provide the customer with an initial cost estimate. All of this information is retained in the bot's memory, so if the customer returns a week later for assistance with their crypto-analytics project and the bot still has the memory of the original 35-page document—it will not ask any further clarification questions or provide an updated project cost estimate. The maximum number of pages allowed for a single PDF file is 50 and for image files is 20 MB (image size). If you have larger files, it's best to use a split method or upload through the web.
The bot maintains memory on three levels to ensure every detail is retained:
At the technical level, all interactions with the user are retained as vectors within a database. Every individual interaction is converted to an embedding and, when a request for assistance is made, the bot searches through previously created embeddings to find phrases similar to past questions, thus obtaining the desired meaning. This feature enables the bot to interpret requests even when phrased differently from previous requests.
First-Line Support. The bot captures every message and assesses whether the message pertains to a question, complaint, product request, or technical failure and can reply to less complex inquiries directly. All messages with greater complexity are forwarded to a live operator with the complete message history, enabling the operator to provide more thorough assistance. In the crypto industry, 68% of the time a bot successfully addresses a user's question without human involvement.
Lead Qualification. The bot collects information from users on their interests, budget, and timeline and establishes which individuals are active prospects and who are not yet ready for engagement, creating a summary report for the manager.
Onboarding. The bot initiates an onboarding sequence following a sale with a series of texts to welcome the customer, provide instructions and answers to frequently asked questions, and provide reminders. Each course/service may be custom-designed to fit particular audiences.
Feedback. Periodic client surveys, analysis of survey feedback, and passing information regarding customer feedback to the Development Team for use in continuous improvement efforts.
WhatsApp API Compatibility/Setup. The WhatsApp Business API is how you integrate the bots that can be deployed via WhatsApp. The WhatsApp Business API must be purchased from either Meta Partners (such as Twilio or 360dialog) or a Business Solutions Provider.
The following are the typical steps involved in integrating with WhatsApp Business API:
Server requirements when self-hosted: minimum of 2 GB of RAM and stable internet connection. In the ASCN.AI ecosystem, the entire system is pre-configured and ready for use. WhatsApp Limits: a brand new WhatsApp account can send up to 1,000 messages per day with the potential to increase to 10,000 once the account is confirmed and will eventually have a limit of 100,000 messages per day. Incoming messages do not have a limit.
Artificial Intelligence & Natural Language Processing (NLP). The bots utilize advanced language models to identify what the client wants and reply with optimal accuracy.
Case Study 1: The Falcon Finance Flash Crash — making $$ on a flash crash. On the evening of October 11, the FF token dropped from $1.20 to $0.03 in the first 15 minutes of the trade due to an exploit, causing the value to plummet. The bot detected the anomaly immediately and analyzed the outbound blockchain to provide users with alerts: "FF has been hacked, liquidity withdrawn, do not purchase the bounce." Those who arbitraged profited between $800–$12,000. The bot handled 2,400 requests and achieved a 100% response rate. One user woke up, saw the alert, made $3,200 in about 2 hours; without the bot, they would have missed the opportunity.
Case Study 2: Automated first response. A crypto agency receiving 150–200 inquiries daily had managers who spent 70% of their time working on templates for the responses. The bot closed 68% of all requests and decreased the initial response time from 4 hours to 8 seconds. As a result, the conversion rate increased by 41%. Clients no longer wait half a day for answers.
Case Study 3: Introducing new users to an arbitrage service. ArbitrageScanner — a company that had experienced low rates of retention from new users to its service. The bot provided step-by-step guidance to each new user and increased the retention from 60% to 84%. It also reduced the workload on the support department by 55% and increased the average value per order by 19% by promoting the use of active features. The bot provided short instruction guides and responded to user queries almost instantaneously, allowing users to make trades without fear in just 7 days.
What types of messages are supported by the chatbot? Formatted text and emojis; voice notes (which have been transcribed and analyzed); screenshots, document, graph, and PDF (up to 50 pages) images; videos (with audio); contact information; geo-location. Everything is handled in the same context and the chatbots are able to maintain a historical record of communications.
How does the chatbot retain context during conversations? Each chat message is translated into an "embedding" and saved in a vector database. When a user's question is received, the bot searches for similar fragments from the previous conversation to facilitate in-depth understanding and customization based on previous interactions with the user, even if the last interaction was several days earlier.
What administration/configuration requirements must be met to implement this system? Must have a Facebook Business Account, a WhatsApp Business phone number, a service provider account that has an API for a minimum of $40/month, and AI model tokens. The set-up at ASCN.AI takes approximately 2–3 hours. There is no programming required.
How is security assured? All data is transmitted using secure HTTPS and TLS 1.3 protocols, data is encrypted and stored using AES-256 encryption standards, customer-specific data is kept isolated, data is anonymous, data is logged for auditing purposes, and ASCN.AI is in full compliance with the worldwide standard for data protection (GDPR) and utilizes two-factor authentication protocol.
Can the bot be configured for specific industries? Definitely! The bot can be customized for crypto (on-chain metrics and news), e-commerce (inventory/catalog), consulting (knowledge base), and education (training resources and answer verifications). Each industry is trained and customized individually.
