Tool-Augmented Agentic AI: A Survey on Composition, Selection, and Integration

  • Abdullah Yousafi Beloit College
  • Nabiha Fatima
  • Mehmet Dik

Abstract

Agentic AI systems enhanced by tool-use represent a shift in how large language models are designed and deployed. They are able to execute tasks that would otherwise be beyond the capabilities of standalone systems. They do this through multi-step reasoning, dynamic interaction with external resources, and by depending on external tools, ranging from APIs to simulators. In this survey, we dive into the rapidly expanding body of research surrounding tool-augmented Agentic AI. This survey examines recent research on tool-augmented agentic AI with a focus on how tools are composed, selected, and integrated into agent architectures. It provides a unifying taxonomy and actionable guidance for navigating this complex landscape. The goal is not to be exhaustive, but to synthesize our findings and relevant frameworks to provide structured guidance for researchers, practitioners and policymakers working on robust, safe and scalable Agentic AI.

Downloads

Download data is not yet available.

References

Acharya, D. B., Kuppan, K., and Divya, B., Agentic ai: autonomous intelligence for complex goals – a comprehensive survey, IEEE Access, (2025).
Baek, J., Jauhar, S. K., Cucerzan, S., and Hwang, S. J., Researchagent: iterative research idea generation over scientific literature with large language models, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the ACL, (2025).
Barra, F. L., Rodella, G., Costa, A., Scalogna, A., Carenzo, L., Monzani, A., and Corte, F. D., From prompt to platform: an agentic AI workflow for healthcare simulation scenario design, Advances in Simulation, 10(1), 29, (2025).
Basu, K., Abdelaziz, I., Chaudhury, S., Dan, S., Crouse, M., Munawar, A., Austel, V., Kumaravel, S., Muthusamy, V., and Kapanipathi, P., Api-blend: a comprehensive corpora for training and benchmarking api llms, Proceedings of the 62nd Annual Meeting of the ACL (Volume 1: Long Papers), (2024).
Bradski, G., The opencv library, Dr. Dobb's Journal: Software Tools for the Professional Programmer, 25(11), 120–123, (2000).
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W., Openai gym, arXiv preprint arXiv:1606.01540, (2016).
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A., Language models are few-shot learners, Advances in Neural Information Processing Systems, 33, 1877–1901, (2020).
Buchanan, B. G., and Feigenbaum, E. A., DENDRAL and Meta-DENDRAL: their applications dimension, in Readings in Artificial Intelligence, pp. 313–322, Elsevier, (1981).
Burton, S., Basil, D. Z., Soboleva, A., and Nesbit, P., Cite me! perspectives on coercive citation in reviewing, Journal of Services Marketing, 38(7), 809–815, (2024).
Canese, L., Cardarilli, G. C., Di Nunzio, L., Fazzolari, R., Giardino, D., Re, M., and Spanò, S., Multi-agent reinforcement learning: a review of challenges and applications, Applied Sciences, 11(11), 4948, (2021).
Chan, A., Salganik, R., Markelius, A., Pang, C., Rajkumar, N., Krasheninnikov, D., Langosco, L., He, Z., Duan, Y., and Carroll, M., Harms from increasingly agentic algorithmic systems, Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, (2023).
Christopoulos, A., Pellas, N., and Laakso, M.-J., A learning analytics theoretical framework for STEM education virtual reality applications, Education Sciences, 10(11), 317, (2020).
Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O., Think you have solved question answering? try arc, the ai2 reasoning challenge, arXiv preprint arXiv:1803.05457, (2018).
Coumans, E., and Bai, Y., Pybullet, a python module for physics simulation for games, robotics and machine learning, (2016).
Davis, E., and Aaronson, S., Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems, arXiv preprint arXiv:2308.05713, (2023).
DeChant, C., Episodic memory in ai agents poses risks that should be studied and mitigated, 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), (2025).
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and Koltun, V., CARLA: an open urban driving simulator, Conference on Robot Learning, (2017).
Eyring, V., Gentine, P., Camps-Valls, G., Lawrence, D. M., and Reichstein, M., AI-empowered next-generation multiscale climate modelling for mitigation and adaptation, Nature Geoscience, 17(10), 963–971, (2024).
Feng, X., Shen, J., and Fan, Y., REST: an alternative to RPC for web services architecture, 2009 First International Conference on Future Information Networks, (2009).
Feng, Y., Yan, Y., Shi, K., and Zhang, Z., Reducing carbon emission at the corporate level: does artificial intelligence matter?, Environmental Impact Assessment Review, 114, 107911, (2025).
Ferber, J., and Weiss, G., Multi-Agent Systems: An Introduction to Distributed Artificial Intelligence, Vol. 1, Addison-Wesley Reading, (1999).
Fielding, R. T., Architectural Styles and the Design of Network-Based Software Architectures, University of California, Irvine, (2000).
Fremantle, P., A history and future of web APIs, it – Information Technology, (2014).
Gade, F., Lund, O., and Mendoza, M. L., Benchmarking zero-shot biomedical relation triplet extraction across language model architectures, Proceedings of the 24th Workshop on Biomedical Language Processing, (2025).
Gridach, M., Nanavati, J., Abidine, K. Z. E., Mendes, L., and Mack, C., Agentic ai for scientific discovery: a survey of progress, challenges, and future directions, arXiv preprint arXiv:2503.08979, (2025).
Hinostroza Fuentes, V. G., Karim, H. A., Tan, M. J. T., and AlDahoul, N., AI with agency: a vision for adaptive, efficient, and ethical healthcare, Frontiers in Digital Health, 7, 1600216, (2025).
Hoy, M. B., Alexa, Siri, Cortana, and more: an introduction to voice assistants, Medical Reference Services Quarterly, 37(1), 81–88, (2018).
Huang, Y., Shi, J., Li, Y., Fan, C., Wu, S., Zhang, Q., Liu, Y., Zhou, P., Wan, Y., and Gong, N. Z., Metatool benchmark for large language models: deciding whether to use tools and which to use, arXiv preprint arXiv:2310.03128, (2023).
Jackson, P., Introduction to Expert Systems, Addison-Wesley, (1986).
Kadi, H. A., and Terzić, K., Agent-Arena: a general framework for evaluating control algorithms, arXiv preprint arXiv:2504.06468, (2025).
Kaptelinin, V., and Nardi, B., Affordances in HCI: toward a mediated action perspective, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, (2012).
Karpas, E., Abend, O., Belinkov, Y., Lenz, B., Lieber, O., Ratner, N., Shoham, Y., Bata, H., Levine, Y., and Leyton-Brown, K., MRKL systems: a modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning, arXiv preprint arXiv:2205.00445, (2022).
Kaufmann, T., Weng, P., Bengs, V., and Hüllermeier, E., A survey of reinforcement learning from human feedback, (2024).
Kumar, A. A., Semantic memory: a review of methods, models, and current challenges, Psychonomic Bulletin & Review, 28(1), 40–80, (2021).
Kvinge, H., Coda, E., Yeats, E., Brown, D., Buckheit, J., Scullen, S. M., Kennedy, B., Truong, L., Kay, W., and Joslyn, C., Probing the limits of mathematical world models in LLMs, ICML 2025 Workshop on Assessing World Models, (2025).
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., and Kang, J., BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics, 36(4), 1234–1240, (2020).
Li, D., Jiang, B., Huang, L., Beigi, A., Zhao, C., Tan, Z., and Liu, H., From generation to judgment: opportunities and challenges of llm-as-a-judge, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 2757–2791, (2025).
Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., and Yang, K., Agentbench: evaluating llms as agents, arXiv preprint arXiv:2308.03688, (2023).
Lu, Z., PubMed and beyond: a survey of web tools for searching biomedical literature, Database, 2011, baq036, (2011).
Lyu, B., Cong, X., Yu, H., Yang, P., Qin, Y., Ye, Y., Lu, Y., Zhang, Z., Yan, Y., and Lin, Y., Gitagent: facilitating autonomous agent with github by tool extension, arXiv preprint arXiv:2312.17294, (2023).
McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., and Nieto, O., librosa: audio and music signal analysis in python, SciPy, 2015, 18–24, (2015).
McKinney, W., Data structures for statistical computing in Python, SciPy, 445(1), 51–56, (2010).
Meurer, A., Smith, C. P., Paprocki, M., Čertík, O., Kirpichev, S. B., Rocklin, M., Kumar, A., Ivanov, S., Moore, J. K., and Singh, S., SymPy: symbolic computing in Python, PeerJ Computer Science, 3, e103, (2017).
Mialon, G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., Rozière, B., Schick, T., Dwivedi-Yu, J., and Celikyilmaz, A., Augmented language models: a survey, arXiv preprint arXiv:2302.07842, (2023).
Miao, X., Oliaro, G., Zhang, Z., Cheng, X., Jin, H., Chen, T., and Jia, Z., Towards efficient generative large language model serving: a survey from algorithms to systems, ACM Computing Surveys, 58(1), 1–37, (2025).
Nakajima, Y., BabyAGI, GitHub repository, (2023).
Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., and Saunders, W., Webgpt: browser-assisted question-answering with human feedback, arXiv preprint arXiv:2112.09332, (2021).
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., and Ray, A., Training language models to follow instructions with human feedback, Advances in Neural Information Processing Systems, 35, 27730–27744, (2022).
Paramarthalingam, K., Daniel, D. A., and Srinivasagopalan, M. L. N., An integrative framework for evaluating healthcare insurance in public health equity and social justice.
Paranjape, B., Lundberg, S., Singh, S., Hajishirzi, H., Zettlemoyer, L., and Ribeiro, M. T., Art: automatic multi-step reasoning and tool-use for large language models, arXiv preprint arXiv:2303.09014, (2023).
Parisi, A., Zhao, Y., and Fiedel, N., Talm: tool augmented language models, arXiv preprint arXiv:2205.12255, (2022).
Parui, P., and Prettner, K., Public provision of healthcare and basic science: what are the effects on economic growth and welfare?, Vienna University of Economics and Business, (2024).
Patil, S. G., Zhang, T., Wang, X., and Gonzalez, J. E., Gorilla: large language model connected with massive apis, Advances in Neural Information Processing Systems, 37, 126544–126565, (2024).
Peng, J.-L., Cheng, S., Diau, E., Shih, Y.-Y., Chen, P.-H., Lin, Y.-T., and Chen, Y.-N., A survey of useful llm evaluation, arXiv preprint arXiv:2406.00936, (2024).
Qian, C., Acikgoz, E. C., Wang, H., Chen, X., Sil, A., Hakkani-Tur, D., Tur, G., and Ji, H., SMART: self-aware agent for tool overuse mitigation, Findings of the ACL, (2025).
Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y., Cong, X., Tang, X., and Qian, B., Toolllm: facilitating large language models to master 16000+ real-world apis, arXiv preprint arXiv:2307.16789, (2023).
Qiu, J., Lam, K., Li, G., Acharya, A., Wong, T. Y., Darzi, A., Yuan, W., and Topol, E. J., LLM-based agentic systems in medicine and healthcare, Nature Machine Intelligence, 6(12), 1418–1420, (2024).
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J., Learning transferable visual models from natural language supervision, International Conference on Machine Learning, (2021).
Russell, S., and Norvig, P., Artificial Intelligence: A Modern Approach, Prentice-Hall, (1995).
Sapkota, R., Roumeliotis, K. I., and Karkee, M., Ai agents vs. agentic ai: a conceptual taxonomy, applications and challenges, arXiv preprint arXiv:2505.10468, (2025).
Sawant, P., Agentic AI: a quantitative analysis of performance and applications, Preprints, https://doi.org/10.20944/preprints202502.1647.v1, (2025).
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., and Scialom, T., Toolformer: language models can teach themselves to use tools, Advances in Neural Information Processing Systems, 36, 68539–68551, (2023).
Schmidgall, S., Su, Y., Wang, Z., Sun, X., Wu, J., Yu, X., Liu, J., Liu, Z., and Barsoum, E., Agent laboratory: using llm agents as research assistants, arXiv preprint arXiv:2501.04227, (2025).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347, (2017).
Sendak, M. P., Ratliff, W., Sarro, D., Alderton, E., Futoma, J., Gao, M., Nichols, M., Revoir, M., Yashar, F., and Miller, C., Real-world integration of a sepsis deep learning technology into routine clinical care: implementation study, JMIR Medical Informatics, 8(7), e15182, (2020).
Shavit, Y., Agarwal, S., Brundage, M., Adler, S., O'Keefe, C., Campbell, R., Lee, T., Mishkin, P., Eloundou, T., and Hickey, A., Practices for governing agentic AI systems, Research Paper, OpenAI, (2023).
Shen, Y., Song, K., Tan, X., Li, D., Lu, W., and Zhuang, Y., Hugginggpt: solving ai tasks with chatgpt and its friends in hugging face, Advances in Neural Information Processing Systems, 36, 38154–38180, (2023).
Shortliffe, E., Computer-Based Medical Consultations: MYCIN, Vol. 2, Elsevier, (2012).
Singh, H., Graber, M. L., Kissam, S. M., Sorensen, A. V., Lenfestey, N. F., Tant, E. M., Henriksen, K., and LaBresh, K. A., System-related interventions to reduce diagnostic errors: a narrative review, BMJ Quality & Safety, 21(2), 160–170, (2012).
Stoytchev, A., Behavior-grounded representation of tool affordances, Proceedings of the 2005 IEEE International Conference on Robotics and Automation, (2005).
Tang, Q., Deng, Z., Lin, H., Han, X., Liang, Q., Cao, B., and Sun, L., Toolalpaca: generalized tool learning for language models with 3000 simulated cases, arXiv preprint arXiv:2306.05301, (2023).
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., Jin, A., Bos, T., Baker, L., and Du, Y., Lamda: language models for dialog applications, arXiv preprint arXiv:2201.08239, (2022).
Tieman, J. J., Lawrence, M. A., Damarell, R. A., Sladek, R. M., and Nikolof, A., LIt. search: fast tracking access to Aboriginal and Torres Strait Islander health literature, Australian Health Review, 38(5), 541–545, (2014).
Tisue, S., and Wilensky, U., Netlogo: a simple environment for modeling complexity, International Conference on Complex Systems, (2004).
Tonmoy, S., Zaman, S., Jain, V., Rani, A., Rawte, V., Chadha, A., and Das, A., A comprehensive survey of hallucination mitigation techniques in large language models, arXiv preprint arXiv:2401.01313, (2024).
Wan, H., Yang, C., Yu, J., Tu, M., Lu, J., Yu, D., Cao, J., Gao, B., Xie, J., and Wang, A., DeepResearch Arena: the first exam of LLMs' research abilities via seminar-grounded tasks, arXiv preprint arXiv:2509.01396, (2025).
Wang, C., Luo, W., Dong, S., Xuan, X., Li, Z., Ma, L., and Gao, S., Mllm-tool: a multimodal large language model for tool agent learning, 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2025).
Wang, Z., Cheng, Z., Zhu, H., Fried, D., and Neubig, G., What are tools anyway? a survey from the language model perspective, arXiv preprint arXiv:2403.15452, (2024).
Wang, Z., Wang, L., Dounis, A. I., and Yang, R., Multi-agent control system with information fusion based comfort model for smart buildings, Applied Energy, 99, 247–254, (2012).
Ward, B., Bhati, D., Neha, F., and Guercio, A., Analyzing the impact of AI tools on student study habits and academic performance, 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC), (2025).
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., and Zhou, D., Chain-of-thought prompting elicits reasoning in large language models, Advances in Neural Information Processing Systems, 35, 24824–24837, (2022).
Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., and Liu, J., Autogen: enabling next-gen LLM applications via multi-agent conversations, First Conference on Language Modeling, (2024).
Xu, B., Liu, X., Shen, H., Han, Z., Li, Y., Yue, M., Peng, Z., Liu, Y., Yao, Z., and Xu, D., Gentopia: a collaborative platform for tool-augmented llms, arXiv preprint arXiv:2308.04030, (2023).
Yao, S., Shinn, N., Razavi, P., and Narasimhan, K., τ-bench: a benchmark for tool-agent-user interaction in real-world domains, arXiv preprint arXiv:2406.12045, (2024).
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., and Cao, Y., React: synergizing reasoning and acting in language models, The Eleventh International Conference on Learning Representations, (2022).
Yuan, S., Song, K., Chen, J., Tan, X., Shen, Y., Ren, K., Li, D., and Yang, D., Easytool: enhancing llm-based agents with concise tool instruction, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the ACL, (2025).
Zhou, S., Xu, F. F., Zhu, H., Zhou, X., Lo, R., Sridhar, A., Cheng, X., Ou, T., Bisk, Y., and Fried, D., Webarena: a realistic web environment for building autonomous agents, arXiv preprint arXiv:2307.13854, (2023).
Published
2026-04-09
Section
Special Issue: Advances in Mathematical Sciences