

The first time I created my own Selenium script to extract data from the exchange took me three days—just to figure out the right selectors and write error-checking routines. Then, when the exchange made changes (even minor), all my hard work was wasted! Now, an AI bot can complete the same job in twenty minutes. In addition, the bot can adapt to changes in the site since it understands the context of what it’s doing, rather than just looking for elements in the fixed order of a DOM. This is not just a time-saver—it’s an entirely new way of thinking about how to code for the web!
“Traditional web research and traditional script development are obsolete. We used to spend dozens of hours writing code that would be unusable whenever the sites changed. Many thanks to browsers powered by GPTs, the model itself knows what the page is, how to find its elements, and how to perform the desired operation without having to change any code. In eight months, we have analyzed 43 different ways to do crypto-analytics and have determined that both Selenium and Puppeteer only function successfully when combined with large language models; otherwise, there would be an infinite loop of continual updates needed to maintain the code.”
Browser automation (sometimes referred to as browser scraping) refers to when a computer can perform actions on your behalf (clicking buttons, entering information into forms, navigating to different pages, and collecting data) without needing input or assistance from a human user. Historically, to automate a web interface, you had to write static scripts with very specific selectors for the specific elements of an interface you wanted to target—often resulting in extremely brittle (and, therefore, easily broken) code. The entire structure would immediately come down if there was even a slight modification to the interface. Today, there is the availability of GPT and AI. This affords us the ability to write flexible scenarios that do not always follow the step-by-step model of development, but instead can comprehend what is occurring on the page. This allows for variable changes to be accounted for and modified even though the markup has changed.

Traditional scripts (which are broken with any advancement to the code) have a fundamental challenge, which is that they are typically very time-consuming to maintain since code modifications break the logic of the script and fixing this logic generally requires many hours of work. Automation using GPT technology does not have this issue, as it analyzes the semantics of the elements as well as their technical data. This is a vitally important factor when considering that there are times when maintaining the code can be entirely more costly than utilizing automation. By utilizing the GPT methods of developing scripts, the time spent maintaining the scripts can be virtually eliminated, while the time used to develop the scripts would increase by up to ten times.
Automating browser actions is the use of an automatic program to perform actions that you would normally do manually (i.e., clicking, filling out forms, pressing buttons). The primary tools for performing browser automation include:
There are two categories of automation methods: imperative automation requires you to write out every single action in code, while declarative automation requires you to describe the desired outcome and allow the system to define the steps needed to achieve it. A good example of declarative automation is GPT, which allows you to state a task using natural language, and then automatically execute all necessary actions via DOM analysis. When it comes to testing webpages in the cryptocurrency industry, Playwright frequently exhibits a much higher level of performance than Selenium, especially with data updated every hundredth of a second via WebSockets. Selenium will not have enough time to retrieve page updates in real time, while Playwright, with its automatic wait times, makes test executions more manageable. However, if a webpage has too many business rules, an AI agent is the better option.
GPT is an extremely large, trained Natural Language Processing (NLP) model based on generative, transformer architecture. GPT allows you to identify concepts, respond to questions, and write code in programming languages. When working with browser automation, the browser agent will analyze the DOM (Document Object Model) by utilizing an analysis of its HTML element types and CSS rules to determine the correct command to execute for each specific element (click the submit button, etc.)
When using GPT to issue a command to the browser, you typically pass either a DOM snapshot or part of a DOM snapshot, and specify: when an element changes, rather than finding an element that matches the original selector, the model will look for another element. That is, if an element is added or modified.
A great example of this is how ASCN.AI’s token listing utility works via the twelve different exchanges. This functionality originally required 8 hours of staff time each week for manual selector editing. After the implementation of GPT-4, the GPT-4 agent obtains the HTML (and sometimes, a screenshot) for the listing page, locates the table from the structure and header of the table, extracts the required data, and inserts that data into the database. Currently, the agent has reduced maintenance to zero since the agent can self-modify.
When using AI in browser automation, we are referring to the use of AI to execute browser automation via machine learning, primarily with large language models (LLMs), which develop and execute browser automation without the use of hard-coded scenarios or selectors. Instead of creating a list of instructions for the user to use to complete a task on a website, the user simply provides the AI a verbal description of a task that they want the AI to assist them with and the AI can use that description to move through the user interface and determine what actions are needed. An AI-based browser automation system comprises of:
GPT and other AI models are capable of streamlining workflows and providing great support in automating any processes. As an example, here are the main functions of GPT and related LLM (Claude, Llama, Gemini):
These tasks become even more important when considering that most SPA type applications are created with React or Vue, because these frameworks render their browsers in a fluid/flexible DOM. In the case of using machine vision technology, AI engine such as the Anthropic Claude will also provide image processing of snapshots providing users with additional capabilities to work with browser automation tools. Also, most agent models will run continuous cycles - observe / act / check / adjust.
Three different methods currently exist for joining GPT into the process automation workflow:
In summary, implementing process automation has been made so much easier thanks to AI technology that by building on to your current platform as long as you can describe your process' objective in the same way that we discuss with each other you will be provided a pre-built automated workflow solution- automation with a graphical assign build where the assigned solutions will encompass everything from triggers and API integrations of the processed application and services such as Google Sheets, Telegram, etc. In many cases where the user does not have programming knowledge of either the Playwright or OpenAI API, they won't be needed when building those automated workflow solutions.
Navigating through the menus of most websites today involves either using an interface (a navigation bar) or directly using the semantic model to find the desired section. When a website's page is updated or changed in some way the way the page was set up potentially will have changed the way you navigate to get there.
Form Filling - The definition of filling out the form is finding the mapping between the fields of the form (like name or address) and what you're putting in those fields.
Clicking on Items - The methods for locating the elements on the page and clicking them depend on whether you are successful in clicking them or not. If you can't find the element that you want to click, you'll have to access the methods for retrying until you have accessed it.
Data Extraction and Storage - To gather information from a previous automation script, you will need to extract tabular information (like dates of previous purchases) as well as extract information from a website that contains the same type of information (like downloads) and store that information in a protected area on your own computer. You will then need to structure that information (like dates of purchases and downloads) and save it in the appropriate format.
Scenario 1 - Monitoring for New Token Listings
This was successful in identifying a Binance listing 12 Min before other traders were able to see the Listing and subsequently it increased by 34% in value within minutes of the Listing.
Scenario 2 is the Automation of KYC Forms Uploaded to Exchanges
Overall we have reduced the amount of time required to complete a KYC Form from 15-20 minutes to 3 minutes, representing a Total of 11+ Hours Saved Per Month.
Scenario 3 relates to # Whales (TX) Information
Workflow throughout the above procedures allowed us to identify Capital Outflows from an Exchange through 48 hours prior to the Market Collapse.
The agent begins by dividing a given objective into small tasks or task components, completes those tasks and returns the output of the tasks back to the user upon completion of all of the tasks. Therefore, the user will connect to the agent by (1) defining how to access the agent, (2) defining the task(s) to be completed by the agent, and (3) the agent will create a plan to accomplish the user-defined request and perform the created Plan to fulfill the request made by the user while validating the completion of the Plan to accomplish the user's request. The ChatGPT standard provides the user with an assistant function. For example, the user may use a command performed by the agent by way of a DOM and then execute that command through another execution environment. ASCN.AI is built on top of custom agents and Web3 data and the built-in GPT API and provides a reliable and versatile method for accomplishing the user´s request.
Scripts can run on the client side through a browser extension (Manifest V3) or the same scripts can be run server-side (Node.js) using the OpenAI and Playwright APIs. Each script run on the client (using a browser extension) will run the script within the current tab and then provide the DOM to the agent through a request to execute the user's command in the browser. The only limitation of the client-side execution of the script is that the request can only be executed in the current tab of the browser. The server-side execution of the script will allow the user to run all requests using the OpenAI and Playwright APIs from a server-based location as well as provide an increased number of concurrent requests and efficiently respond to multiple users.
Users can also use pre-determined solutions to configure their request. Some existing methods to configure a user's request are LangChain, BrowserGPT (open-source) or the ASCN.AI NoCode (with visual editor).
Core APIs for Automation:
Everything explained above defines standard automation architecture:
To obtain dynamic data, you will use WebSocket interception or making an API call. ASCN's own Blockchain API will give you access to on-chain data and allow you to bypass UI parsing issues.
Main Threats:
Recommended Practical Measures:
The technical limitations and issues: When automating with LLM, you need to remember that LLM latency (which is between 0.5 and 3 seconds) is not a good fit for high-volume requests and that you are limited by the context size (as of now, GPT-4 has an approx max of 128k tokens of context size; however, it is much less in practice). The DOM is sometimes quite complicated, which can affect how accurately LLMs interpret it. The cost of scaling can become very significant.
Recommendations:
Data collection without permission, or in violation of regulated guidelines is illegal and can result in fines. It's important to check "robots.txt" and the terms of use, and to comply with any time delays and request limits. This is not intended to be an incentive to violate any rules whatsoever; the ultimate responsibility rests with the executor. Please note that the information contained herein does not constitute legal advice.
Sure, GPT Automation is run in browsers and tool-dependent browsers, such as Selenium, Puppeteer, and Playwright. Playwright running on Chromium is generally the most stable option for 95% of your tasks, while Firefox and Safari are comparable, you will need to use an alternative to Puppeteer for mobile automation such as Appium.
JavaScript/TypeScript should be used for Puppeteer and Playwright projects; while there is Python typically used to integrate with the GPT API and Selenium. C# and Java are used in enterprise environments. No-code options are also available.
| Feature | Traditional Automation | GPT Control |
|---|---|---|
| Selector Dependency | Rigid (CSS, XPath) | Flexible (Semantic) |
| Adaptability to Change | Manual Code Changes | Automatic DOM Interpretation |
| Barrier To Entry | Requires Coding | No-Code and Natural Language |
| Speed of Execution | High Speed, No Latency | Medium Speed, Due to LLM Calls |
| Cost | Low Cost, Free Tools | Medium/High Cost due to API Calls |
| Exception Handling | Explicitly Defined, Complex | Automated Attempts with Alternatives |
GPT and AI Browser Automation are a substantial step forward from brittle and labor-intensive scripting processes to adaptive methods that offer flexible responses to inconsistent user interactions. The long process of troubleshooting bugs in Selenium (and fixing them) has been replaced by an accelerated development of finished workflows built in under 10 minutes using no-code platforms. Since the advent of GPT, we now see the equivalent of hundreds of lines of code being streamlined down to a single command; we anticipate soon GPT will be recognized as standard-setting for automation. Other innovations that are just around the corner include multimodal agents, autonomous browser-based assistants, and combining blockchain data with smart contracts. With the assistance of GPT, you should be able to save dozens of hours or more each month; soon the entire business could be effectively automated.
All information in this article is general information and does not take the place of suggestions, advice, or recommendations regarding investment, legal, or financial decisions. A thorough understanding of the functions of each platform and their associated AI assistants is a prerequisite to utilizing an AI assistant.