With the development of artificial intelligence (AI), particularly in natural language processing and machine learning, AI applications in code generation, error correction, and programming assistance have become more common. However, differences in code generation capabilities among models influence their practical applicability in programming tasks. To investigate this issue, this study
... [Show full abstract] evaluates the performance of five state-of-the-art large language models (LLMs)GPT-4o, OpenAI o1, OpenAI o1 Pro, Claude 3.5, and Gemini 2.0through a systematic comparative analysis across three programming languages: Python, Java, and Swift. The evaluation framework considers multiple aspects, including overall accuracy, code efficiency, time complexity, space complexity, and multi-solution generation capabilities.The experimental results reveal substantial variations among models: OpenAI o1 Pro and Gemini achieve the highest accuracy, GPT-4o generates the most concise code, and Claude 3.5 produces the greatest number of alternative solutions. However, all models exhibit lower performance in Swift compared to Python and Java, likely due to the limited availability of training data in Swift. An in-depth error analysis identifies differences in model adaptability across programming languages and highlights key limitations of AI-assisted programming. These findings provide insights for developers and users of AI-assisted programming tools, supporting more informed decision-making in selecting and applying these technologies in different programming contexts.