A team of researchers from Nanjing University and the University of Sydney has developed an AI agent named A2, which identifies and verifies vulnerabilities in Android applications. A2 achieved 78.3% coverage on the Ghera test suite, outperforming the static analyzer APKHunt, which achieved only 30%. In tests on 169 real APKs, A2 discovered 104 zero-day vulnerabilities, with 57 confirmed through automatically generated exploits. A notable vulnerability was a medium-severity bug in an app with over 10 million installs, involving an intentional redirection issue that could allow malware to take control.
A2 features a validation module that was absent in its predecessor, A1, which only assessed attack profitability. A2 validates vulnerabilities step by step, with automatic verification at each stage. It integrates several commercial language models, including OpenAI o3, Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT-oss-120b, which are assigned specific roles in the vulnerability detection process. The cost of vulnerability detection ranges from [openai_gpt model="gpt-4o-mini" prompt="Summarize the content and extract only the fact described in the text bellow. The summary shall NOT include a title, introduction and conclusion. Text: Artificial intelligence systems have faced scrutiny for their tendency to generate perplexing vulnerability reports, often overwhelming open-source developers with irrelevant notifications. However, a team of researchers from Nanjing University and the University of Sydney has introduced a promising solution: an agent named A2, designed to identify and verify vulnerabilities in Android applications, effectively emulating the work of a bug hunter. This development builds upon the earlier A1 project, which was successful in exploiting bugs within smart contracts.
Performance and Capabilities
The researchers assert that A2 achieved an impressive 78.3% coverage on the Ghera test suite, significantly surpassing the static analyzer APKHunt, which managed only 30%. In tests conducted on 169 real APKs, A2 uncovered 104 zero-day vulnerabilities, with 57 of these confirmed through automatically generated working exploits. Notably, among these vulnerabilities was a medium-severity bug found in an application boasting over 10 million installs. This particular flaw involved an intentional redirection issue that could allow malware to seize control.
A2's standout feature is its validation module, which was notably absent in its predecessor. The earlier A1 system employed a fixed verification scheme, merely assessing the profitability of an attack. In contrast, A2 meticulously validates vulnerabilities step by step, breaking down the process into specific tasks. For instance, in an application where the AES key was stored in clear text, A2 first locates the key in the strings.xml file, generates a fake password reset token using that key, and subsequently verifies that the token successfully bypasses authentication. Each step is accompanied by automatic verification, ensuring accuracy from value matching to confirming application activity.
Technological Framework
A2 operates by integrating several commercial language models, including OpenAI o3, Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT-oss-120b. These models are allocated specific roles: the planner devises an attack strategy, the executor carries out the actions, and the validator confirms the outcomes. This architecture mirrors human methodology, enabling a reduction in noise and an increase in confirmed results. Traditional analysis tools often generate thousands of insignificant signals, yielding very few genuine threats, whereas A2 can promptly demonstrate the exploitability of a flaw.
Cost Efficiency
The researchers have also analyzed the cost implications of the system. Vulnerability detection ranges from [cyberseo_openai model="gpt-4o-mini" prompt="Rewrite a news story for a business publication, in a calm style with creativity and flair based on text below, making sure it reads like human-written text in a natural way. The article shall NOT include a title, introduction and conclusion. The article shall NOT start from a title. Response language English. Generate HTML-formatted content using tag for a sub-heading. You can use only , , , , and HTML tags if necessary. Text: Redazione RHC : 6 September 2025 09:12Artificial intelligence systems have been criticized for creating confusing vulnerability reports and inundating open-source developers with irrelevant complaints. But researchers at Nanjing University and the University of Sydney have an example to the contrary: they presented an agent called A2, capable of finding and verifying vulnerabilities in Android applications, mimicking the work of a bug hunter. The new development is a continuation of the previous A1 project, which was able to exploit bugs in smart contracts.The authors claim that A2 achieved 78.3% coverage on the Ghera test suite, outperforming the static analyzer APKHunt, which achieved only 30%. Run on 169 real APKs, it found 104 zero-day vulnerabilities, 57 of which were confirmed by automatically generated working exploits. Among these, a medium-severity bug in an app with over 10 million installs. This was an intentional redirection issue that allowed the malware to gain control.A2’s main distinguishing feature is the validation module, which was absent in its predecessor.The old A1 system used a fixed verification scheme that only assessed whether an attack would be profitable. A2, on the other hand, can confirm a vulnerability step by step, breaking down the process into specific tasks. As an example, the authors cite a scenario involving an application where the AES key was stored in clear text. The agent first finds the key in the strings.xml file, then uses it to generate a fake password reset token, and finally verifies that this token actually bypasses authentication. All steps are accompanied by automatic verification: from matching values to confirming application activity and displaying the desired address on the screen.To function, A2 combines several commercial language models: OpenAI o3, Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT-oss-120b. They are distributed according to roles: the planner develops an attack strategy, the executor executes the actions, and the validator confirms the result. This architecture, according to the authors, replicates the human methodology, which has allowed them to reduce noise and increase the number of confirmed results. The developers point out that traditional analysis tools produce thousands of insignificant signals and very few real threats, while their agent is able to immediately demonstrate the exploitability of a flaw.The researchers also calculated the cost of the system. Vulnerability detection costs between $0.0004 and $0.03 per app using different models, while a full cycle with verification costs an average of $1.77. At the same time, using only Gemini 2.5 Pro, the cost increases to $8.94 per bug. For comparison, last year a team from the University of Illinois demonstrated that GPT-4 creates an exploit from a vulnerability description for $8.80. It turns out that the cost of finding and confirming flaws in mobile apps is comparable to the cost of a medium-severity vulnerability in bug bounty programs, where rewards are calculated in the hundreds and thousands of dollars.Experts point out that A2 already outperforms Android static program analyzers, and A1 is close to the best results in smart contracts. They are confident that this approach can speed up and simplify the work of both researchers and hackers, because instead of developing complex tools, they simply call the API of pre-trained models. However, a problem remains: Bounty hunters can use A2 to get rich quickly, but bounty programs don’t cover all bugs. This leaves loopholes for attackers to directly exploit the found bugs.The authors of the article believe that the industry is just beginning to develop and that a surge in activity in both defensive and offensive attacks can be expected in the near future. Industry representatives emphasize that systems like A2 shift vulnerability searches from endless alerts to confirmed results, reducing the number of false positives and allowing them to focus on real risks.For now, the source code is only available to researchers with official partnerships, to maintain a balance between open science and responsible dissemination.RedazioneThe editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.Lista degli articoli" temperature="0.3" top_p="1.0" best_of="1" presence_penalty="0.1" ].0004 to [cyberseo_openai model="gpt-4o-mini" prompt="Rewrite a news story for a business publication, in a calm style with creativity and flair based on text below, making sure it reads like human-written text in a natural way. The article shall NOT include a title, introduction and conclusion. The article shall NOT start from a title. Response language English. Generate HTML-formatted content using tag for a sub-heading. You can use only , , , , and HTML tags if necessary. Text: Redazione RHC : 6 September 2025 09:12Artificial intelligence systems have been criticized for creating confusing vulnerability reports and inundating open-source developers with irrelevant complaints. But researchers at Nanjing University and the University of Sydney have an example to the contrary: they presented an agent called A2, capable of finding and verifying vulnerabilities in Android applications, mimicking the work of a bug hunter. The new development is a continuation of the previous A1 project, which was able to exploit bugs in smart contracts.The authors claim that A2 achieved 78.3% coverage on the Ghera test suite, outperforming the static analyzer APKHunt, which achieved only 30%. Run on 169 real APKs, it found 104 zero-day vulnerabilities, 57 of which were confirmed by automatically generated working exploits. Among these, a medium-severity bug in an app with over 10 million installs. This was an intentional redirection issue that allowed the malware to gain control.A2’s main distinguishing feature is the validation module, which was absent in its predecessor.The old A1 system used a fixed verification scheme that only assessed whether an attack would be profitable. A2, on the other hand, can confirm a vulnerability step by step, breaking down the process into specific tasks. As an example, the authors cite a scenario involving an application where the AES key was stored in clear text. The agent first finds the key in the strings.xml file, then uses it to generate a fake password reset token, and finally verifies that this token actually bypasses authentication. All steps are accompanied by automatic verification: from matching values to confirming application activity and displaying the desired address on the screen.To function, A2 combines several commercial language models: OpenAI o3, Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT-oss-120b. They are distributed according to roles: the planner develops an attack strategy, the executor executes the actions, and the validator confirms the result. This architecture, according to the authors, replicates the human methodology, which has allowed them to reduce noise and increase the number of confirmed results. The developers point out that traditional analysis tools produce thousands of insignificant signals and very few real threats, while their agent is able to immediately demonstrate the exploitability of a flaw.The researchers also calculated the cost of the system. Vulnerability detection costs between $0.0004 and $0.03 per app using different models, while a full cycle with verification costs an average of $1.77. At the same time, using only Gemini 2.5 Pro, the cost increases to $8.94 per bug. For comparison, last year a team from the University of Illinois demonstrated that GPT-4 creates an exploit from a vulnerability description for $8.80. It turns out that the cost of finding and confirming flaws in mobile apps is comparable to the cost of a medium-severity vulnerability in bug bounty programs, where rewards are calculated in the hundreds and thousands of dollars.Experts point out that A2 already outperforms Android static program analyzers, and A1 is close to the best results in smart contracts. They are confident that this approach can speed up and simplify the work of both researchers and hackers, because instead of developing complex tools, they simply call the API of pre-trained models. However, a problem remains: Bounty hunters can use A2 to get rich quickly, but bounty programs don’t cover all bugs. This leaves loopholes for attackers to directly exploit the found bugs.The authors of the article believe that the industry is just beginning to develop and that a surge in activity in both defensive and offensive attacks can be expected in the near future. Industry representatives emphasize that systems like A2 shift vulnerability searches from endless alerts to confirmed results, reducing the number of false positives and allowing them to focus on real risks.For now, the source code is only available to researchers with official partnerships, to maintain a balance between open science and responsible dissemination.RedazioneThe editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.Lista degli articoli" temperature="0.3" top_p="1.0" best_of="1" presence_penalty="0.1" ].03 per app depending on the model used, while a complete cycle with verification averages around .77. However, utilizing only Gemini 2.5 Pro raises the cost to .94 per bug. In comparison, a previous study by a team from the University of Illinois revealed that GPT-4 generates an exploit from a vulnerability description for .80. Thus, the expenses associated with identifying and confirming flaws in mobile applications align closely with the rewards offered in bug bounty programs, which often range in the hundreds to thousands of dollars.
Future Implications
Experts highlight that A2 already surpasses Android static program analyzers, while A1 approaches the leading results in smart contracts. There is optimism that this innovative approach can expedite and simplify the efforts of both researchers and hackers alike, as they can leverage the API of pre-trained models instead of developing complex tools. Nonetheless, a challenge persists: while bounty hunters may exploit A2 for quick financial gain, bounty programs do not encompass all bugs, leaving vulnerabilities open for direct exploitation by attackers.
The authors of the study believe that the industry is on the cusp of significant evolution, anticipating a surge in both defensive and offensive cyber activities in the near future. Industry representatives stress that systems like A2 are transforming vulnerability searches from a barrage of alerts to confirmed results, thereby minimizing false positives and allowing stakeholders to concentrate on genuine risks. Currently, the source code remains accessible only to researchers with official partnerships, striking a balance between open science and responsible dissemination.
RedazioneThe editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general." max_tokens="3500" temperature="0.3" top_p="1.0" best_of="1" presence_penalty="0.1" frequency_penalty="frequency_penalty"].0004 to [openai_gpt model="gpt-4o-mini" prompt="Summarize the content and extract only the fact described in the text bellow. The summary shall NOT include a title, introduction and conclusion. Text: Artificial intelligence systems have faced scrutiny for their tendency to generate perplexing vulnerability reports, often overwhelming open-source developers with irrelevant notifications. However, a team of researchers from Nanjing University and the University of Sydney has introduced a promising solution: an agent named A2, designed to identify and verify vulnerabilities in Android applications, effectively emulating the work of a bug hunter. This development builds upon the earlier A1 project, which was successful in exploiting bugs within smart contracts.
Performance and Capabilities
The researchers assert that A2 achieved an impressive 78.3% coverage on the Ghera test suite, significantly surpassing the static analyzer APKHunt, which managed only 30%. In tests conducted on 169 real APKs, A2 uncovered 104 zero-day vulnerabilities, with 57 of these confirmed through automatically generated working exploits. Notably, among these vulnerabilities was a medium-severity bug found in an application boasting over 10 million installs. This particular flaw involved an intentional redirection issue that could allow malware to seize control.
A2's standout feature is its validation module, which was notably absent in its predecessor. The earlier A1 system employed a fixed verification scheme, merely assessing the profitability of an attack. In contrast, A2 meticulously validates vulnerabilities step by step, breaking down the process into specific tasks. For instance, in an application where the AES key was stored in clear text, A2 first locates the key in the strings.xml file, generates a fake password reset token using that key, and subsequently verifies that the token successfully bypasses authentication. Each step is accompanied by automatic verification, ensuring accuracy from value matching to confirming application activity.
Technological Framework
A2 operates by integrating several commercial language models, including OpenAI o3, Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT-oss-120b. These models are allocated specific roles: the planner devises an attack strategy, the executor carries out the actions, and the validator confirms the outcomes. This architecture mirrors human methodology, enabling a reduction in noise and an increase in confirmed results. Traditional analysis tools often generate thousands of insignificant signals, yielding very few genuine threats, whereas A2 can promptly demonstrate the exploitability of a flaw.
Cost Efficiency
The researchers have also analyzed the cost implications of the system. Vulnerability detection ranges from [cyberseo_openai model="gpt-4o-mini" prompt="Rewrite a news story for a business publication, in a calm style with creativity and flair based on text below, making sure it reads like human-written text in a natural way. The article shall NOT include a title, introduction and conclusion. The article shall NOT start from a title. Response language English. Generate HTML-formatted content using tag for a sub-heading. You can use only , , , , and HTML tags if necessary. Text: Redazione RHC : 6 September 2025 09:12Artificial intelligence systems have been criticized for creating confusing vulnerability reports and inundating open-source developers with irrelevant complaints. But researchers at Nanjing University and the University of Sydney have an example to the contrary: they presented an agent called A2, capable of finding and verifying vulnerabilities in Android applications, mimicking the work of a bug hunter. The new development is a continuation of the previous A1 project, which was able to exploit bugs in smart contracts.The authors claim that A2 achieved 78.3% coverage on the Ghera test suite, outperforming the static analyzer APKHunt, which achieved only 30%. Run on 169 real APKs, it found 104 zero-day vulnerabilities, 57 of which were confirmed by automatically generated working exploits. Among these, a medium-severity bug in an app with over 10 million installs. This was an intentional redirection issue that allowed the malware to gain control.A2’s main distinguishing feature is the validation module, which was absent in its predecessor.The old A1 system used a fixed verification scheme that only assessed whether an attack would be profitable. A2, on the other hand, can confirm a vulnerability step by step, breaking down the process into specific tasks. As an example, the authors cite a scenario involving an application where the AES key was stored in clear text. The agent first finds the key in the strings.xml file, then uses it to generate a fake password reset token, and finally verifies that this token actually bypasses authentication. All steps are accompanied by automatic verification: from matching values to confirming application activity and displaying the desired address on the screen.To function, A2 combines several commercial language models: OpenAI o3, Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT-oss-120b. They are distributed according to roles: the planner develops an attack strategy, the executor executes the actions, and the validator confirms the result. This architecture, according to the authors, replicates the human methodology, which has allowed them to reduce noise and increase the number of confirmed results. The developers point out that traditional analysis tools produce thousands of insignificant signals and very few real threats, while their agent is able to immediately demonstrate the exploitability of a flaw.The researchers also calculated the cost of the system. Vulnerability detection costs between $0.0004 and $0.03 per app using different models, while a full cycle with verification costs an average of $1.77. At the same time, using only Gemini 2.5 Pro, the cost increases to $8.94 per bug. For comparison, last year a team from the University of Illinois demonstrated that GPT-4 creates an exploit from a vulnerability description for $8.80. It turns out that the cost of finding and confirming flaws in mobile apps is comparable to the cost of a medium-severity vulnerability in bug bounty programs, where rewards are calculated in the hundreds and thousands of dollars.Experts point out that A2 already outperforms Android static program analyzers, and A1 is close to the best results in smart contracts. They are confident that this approach can speed up and simplify the work of both researchers and hackers, because instead of developing complex tools, they simply call the API of pre-trained models. However, a problem remains: Bounty hunters can use A2 to get rich quickly, but bounty programs don’t cover all bugs. This leaves loopholes for attackers to directly exploit the found bugs.The authors of the article believe that the industry is just beginning to develop and that a surge in activity in both defensive and offensive attacks can be expected in the near future. Industry representatives emphasize that systems like A2 shift vulnerability searches from endless alerts to confirmed results, reducing the number of false positives and allowing them to focus on real risks.For now, the source code is only available to researchers with official partnerships, to maintain a balance between open science and responsible dissemination.RedazioneThe editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.Lista degli articoli" temperature="0.3" top_p="1.0" best_of="1" presence_penalty="0.1" ].0004 to [cyberseo_openai model="gpt-4o-mini" prompt="Rewrite a news story for a business publication, in a calm style with creativity and flair based on text below, making sure it reads like human-written text in a natural way. The article shall NOT include a title, introduction and conclusion. The article shall NOT start from a title. Response language English. Generate HTML-formatted content using tag for a sub-heading. You can use only , , , , and HTML tags if necessary. Text: Redazione RHC : 6 September 2025 09:12Artificial intelligence systems have been criticized for creating confusing vulnerability reports and inundating open-source developers with irrelevant complaints. But researchers at Nanjing University and the University of Sydney have an example to the contrary: they presented an agent called A2, capable of finding and verifying vulnerabilities in Android applications, mimicking the work of a bug hunter. The new development is a continuation of the previous A1 project, which was able to exploit bugs in smart contracts.The authors claim that A2 achieved 78.3% coverage on the Ghera test suite, outperforming the static analyzer APKHunt, which achieved only 30%. Run on 169 real APKs, it found 104 zero-day vulnerabilities, 57 of which were confirmed by automatically generated working exploits. Among these, a medium-severity bug in an app with over 10 million installs. This was an intentional redirection issue that allowed the malware to gain control.A2’s main distinguishing feature is the validation module, which was absent in its predecessor.The old A1 system used a fixed verification scheme that only assessed whether an attack would be profitable. A2, on the other hand, can confirm a vulnerability step by step, breaking down the process into specific tasks. As an example, the authors cite a scenario involving an application where the AES key was stored in clear text. The agent first finds the key in the strings.xml file, then uses it to generate a fake password reset token, and finally verifies that this token actually bypasses authentication. All steps are accompanied by automatic verification: from matching values to confirming application activity and displaying the desired address on the screen.To function, A2 combines several commercial language models: OpenAI o3, Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT-oss-120b. They are distributed according to roles: the planner develops an attack strategy, the executor executes the actions, and the validator confirms the result. This architecture, according to the authors, replicates the human methodology, which has allowed them to reduce noise and increase the number of confirmed results. The developers point out that traditional analysis tools produce thousands of insignificant signals and very few real threats, while their agent is able to immediately demonstrate the exploitability of a flaw.The researchers also calculated the cost of the system. Vulnerability detection costs between $0.0004 and $0.03 per app using different models, while a full cycle with verification costs an average of $1.77. At the same time, using only Gemini 2.5 Pro, the cost increases to $8.94 per bug. For comparison, last year a team from the University of Illinois demonstrated that GPT-4 creates an exploit from a vulnerability description for $8.80. It turns out that the cost of finding and confirming flaws in mobile apps is comparable to the cost of a medium-severity vulnerability in bug bounty programs, where rewards are calculated in the hundreds and thousands of dollars.Experts point out that A2 already outperforms Android static program analyzers, and A1 is close to the best results in smart contracts. They are confident that this approach can speed up and simplify the work of both researchers and hackers, because instead of developing complex tools, they simply call the API of pre-trained models. However, a problem remains: Bounty hunters can use A2 to get rich quickly, but bounty programs don’t cover all bugs. This leaves loopholes for attackers to directly exploit the found bugs.The authors of the article believe that the industry is just beginning to develop and that a surge in activity in both defensive and offensive attacks can be expected in the near future. Industry representatives emphasize that systems like A2 shift vulnerability searches from endless alerts to confirmed results, reducing the number of false positives and allowing them to focus on real risks.For now, the source code is only available to researchers with official partnerships, to maintain a balance between open science and responsible dissemination.RedazioneThe editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general.Lista degli articoli" temperature="0.3" top_p="1.0" best_of="1" presence_penalty="0.1" ].03 per app depending on the model used, while a complete cycle with verification averages around .77. However, utilizing only Gemini 2.5 Pro raises the cost to .94 per bug. In comparison, a previous study by a team from the University of Illinois revealed that GPT-4 generates an exploit from a vulnerability description for .80. Thus, the expenses associated with identifying and confirming flaws in mobile applications align closely with the rewards offered in bug bounty programs, which often range in the hundreds to thousands of dollars.
Future Implications
Experts highlight that A2 already surpasses Android static program analyzers, while A1 approaches the leading results in smart contracts. There is optimism that this innovative approach can expedite and simplify the efforts of both researchers and hackers alike, as they can leverage the API of pre-trained models instead of developing complex tools. Nonetheless, a challenge persists: while bounty hunters may exploit A2 for quick financial gain, bounty programs do not encompass all bugs, leaving vulnerabilities open for direct exploitation by attackers.
The authors of the study believe that the industry is on the cusp of significant evolution, anticipating a surge in both defensive and offensive cyber activities in the near future. Industry representatives stress that systems like A2 are transforming vulnerability searches from a barrage of alerts to confirmed results, thereby minimizing false positives and allowing stakeholders to concentrate on genuine risks. Currently, the source code remains accessible only to researchers with official partnerships, striking a balance between open science and responsible dissemination.
RedazioneThe editorial team of Red Hot Cyber consists of a group of individuals and anonymous sources who actively collaborate to provide early information and news on cybersecurity and computing in general." max_tokens="3500" temperature="0.3" top_p="1.0" best_of="1" presence_penalty="0.1" frequency_penalty="frequency_penalty"].03 per app, while a complete verification cycle averages .77. Using only Gemini 2.5 Pro raises the cost to .94 per bug.
Experts believe A2 surpasses existing Android static program analyzers and that its approach can streamline the work of researchers and hackers. However, there are concerns that bounty hunters could exploit A2 for financial gain, as bounty programs do not cover all bugs. The source code is currently available only to researchers with official partnerships.