
1. Why Does Excel Formula Generation Need AI to Break Through the Bottleneck?
In daily work and study, spreadsheet tools like Excel and Google Sheets are core carriers for data processing, and formulas are the key to unlocking their data analysis capabilities. However, for the vast majority of ordinary users, writing Excel formulas is a frustrating task: complex function syntax, precise cell positioning, and tedious logical combinations are not only time–consuming and labor–intensive but also highly error–prone. Minor mistakes can lead to incorrect data results, while major ones may affect decision–making. Even simple numerical calculations can go wrong due to incorrect cell range selection or improper use of operators; let alone complex formulas involving multi–condition filtering, data sorting, and statistical analysis, which often require professional expertise to master.
This pain point has created an urgent demand for intelligent formula generation tools, and the rapid development of AI technology has provided the perfect solution. The proposal of the NL2FORMULA task precisely targets the core need of ‘converting natural language to Excel formulas’—allowing users to describe data requirements in plain language, and AI can automatically generate executable Excel formulas. As a brand deeply rooted in the AI intelligent tool sector, PopAI has keenly captured this market demand, integrating the AI Excel Formula Generator into its core function matrix. Leveraging its technical accumulation in AI chat, content generation, and document processing, PopAI has made the experience of converting natural language to Excel formulas smoother and more accurate, completely breaking down the barrier between ordinary users and professional Excel formulas.
2. The NL2FORMULA Dataset: What is the ‘Training Cornerstone‘ of AI Excel Formula Generation?
The accuracy of any AI tool relies on high–quality datasets, and the core capability of the AI Excel Formula Generator is built on the solid foundation of the NL2FORMULA dataset. This dataset can be called a ‘treasure trove for Excel formula generation training,’ containing 70,799 pairs of natural language queries and corresponding Excel formulas, covering 21,670 tables and 37 types of formula functions. It encompasses two core scenarios: analytical queries (such as combinations of functions like MAXIFS, AVERAGE, and SORT) and numerical calculations (addition, subtraction, multiplication, and division operations). Additionally, based on the number of functions and the complexity of filtering conditions, analytical queries are divided into three levels—simple, medium, and complex—ensuring that the trained AI model can handle usage requirements of varying difficulties.
The dataset construction process is highly ingenious: to avoid the high cost of manual annotation, the R&D team formulated a set of precise conversion rules by analyzing the grammatical logic of SQL and Excel formulas, batch converting SQL queries from mature TEXT2SQL datasets (such as WikiSQL and Spider) into Excel formulas. Meanwhile, to make up for the lack of simple numerical calculations in the converted data, a large number of numerical operation samples from the TATQA dataset were added manually, ultimately forming a comprehensive dataset with both breadth and depth.
When training its own AI Excel Formula Generator, PopAI fully leveraged the advantages of the NL2FORMULA dataset: on one hand, it enabled the model to accurately learn the correspondence between natural language intentions and formula logic through the dataset’s massive samples; on the other hand, it utilized the dataset’s quality verification mechanism—screening high–quality samples through actual Excel execution tests and manual reviews by professionals—to ensure the ‘raw materials’ for model training are sufficiently reliable. Furthermore, PopAI combined its own intelligent document management function to structurally process the dataset, allowing the model to extract key training information more efficiently and further improve the accuracy of formula generation.
3. PopAI’s AI Excel Formula Generator: How Does the Technology Realize ‘Natural Language to Formula‘ Conversion?
PopAI’s AI Excel Formula Generator is based on a sequence–to–sequence (Seq2Seq) framework, drawing on the technical ideas of the fCODER model and conducting in–depth optimization in combination with its own AI function ecosystem. This makes the conversion process from ‘natural language + tabular data’ to ‘Excel formula’ both accurate and efficient. Its technical implementation logic mainly consists of three core steps:
First is the input processing link. When using the tool, users only need to input natural language queries (such as ‘Calculate the proportion of IMFT’s fixed assets to total assets in 2018’) through PopAI’s AI chat function and upload the corresponding Excel table (or import table data from files, Google Cloud, or URLs via PopAI’s flexible upload options). The system will automatically fuse the natural language and table data—distinguishing the two sequences with exclusive symbols, then converting text and table information into model–recognizable vectors through word embedding and positional embedding technologies. In this process, PopAI’s intelligent reading comprehension function plays a crucial role, quickly parsing the table’s row and column structure and data meaning to provide precise support for subsequent formula generation.
Second is the encoding and decoding link. The encoder performs in–depth processing on the fused input vectors, capturing the association logic between natural language intentions and table data (such as which column in the table corresponds to ‘fixed assets’ and which row corresponds to ‘2018’). The decoder then gradually generates each component of the Excel formula based on the encoding results, from function selection (such as SUM, FILTER, MINIFS) to cell range positioning (such as C8:C10) and operator combination, with each step strictly matching the user’s needs. PopAI also integrates AI–enhanced content technology to automatically verify logical rationality during formula generation, avoiding syntax errors or data range deviations.
Finally, the model optimization link. PopAI uses a cross–entropy loss function for continuous model training, constantly optimizing parameters to improve the accuracy of formula generation. At the same time, combined with its own AI–driven content editing function, it performs secondary optimization on the generated formulas—not only ensuring that the formulas can be executed normally in Excel but also adjusting the formula expression according to the user’s usage scenario to make the formulas more concise and understandable. For example, if the user needs to use the formula results for PPT presentations, PopAI can directly convert the formulas and calculation results into slides through its automatic presentation conversion function, paired with relevant images to make the presentation more visually appealing.

4. Practical Verification: How Powerful is PopAI’s AI Excel Tool in Real–World Use?
To test the actual performance of the AI Excel Formula Generator, the PopAI team conducted multi–dimensional experimental verifications, comparing it with mainstream models in the industry (such as FORTAP and GPT–3.5). The results showed a ‘crushing advantage,’ fully demonstrating its reliability in practical applications.
In terms of core indicators, PopAI’s AI Excel Formula Generator performed exceptionally well in Exact Match (EM) accuracy and Execution Result Assessment (ERA) accuracy: in the test set, its large model version achieved an EM accuracy of 70.6% and an ERA accuracy of 77.1%, far exceeding FORTAP’s 24.2% EM accuracy and GPT–3.5 (10–shot)’s 21.4% EM accuracy. This means that the formulas generated by users through natural language not only can accurately match the structure and details of professional formulas but also can stably output correct results in Excel without additional manual modifications.
In formula generation scenarios of different difficulties and types, PopAI also performed impressively. For analytical query formulas, whether simple tasks involving 1–2 functions (such as ‘Calculate the average age of all employees’), medium–difficulty tasks involving 3–4 functions (such as ‘Filter products with sales exceeding 1 million in 2023 and calculate their average profit’), or complex tasks involving more than 4 functions (such as ‘Sort by region first, then filter products with inventory less than 50 and count their total sales volume’), PopAI’s accuracy led the industry, with the ERA accuracy for medium–difficulty tasks reaching 88.7%. For numerical calculation formulas, when facing addition, subtraction, multiplication, division, and combined operations, PopAI’s accuracy was as high as 79.5%, perfectly solving the problem of calculation logic errors that ordinary users are prone to make.
Notably, PopAI has also optimized for the common practical scenario of table position changes. Since Excel tables can be flexibly adjusted in position (such as moving from cell A1 to B2), traditional models often fail to generate valid formulas due to incorrect cell index judgment. However, by intelligently identifying table structure and data associations, PopAI can generate correct formulas even when the table position changes, greatly improving the tool’s practicality.
5. Multi–Scenario Application: How Does PopAI Make AI Excel Formula Generation Benefit Different Users?
PopAI’s AI Excel Formula Generator is not a single–function tool but is deeply integrated into its full–scenario AI ecosystem, providing customized solutions for different users’ needs. It enables the capability of ‘natural language to Excel formula’ to penetrate into various scenarios of work, study, and research.
For office workers, data analysis is a core part of daily work. For example, marketing specialists need to analyze marketing data from various channels. They only need to input ‘Calculate the conversion rate ranking of each channel in the first quarter of 2024’ through PopAI’s AI chat function and upload the table containing channel information, exposure volume, and click volume. The system can quickly generate the corresponding sorting and calculation formulas. After generating the formulas, users can also use PopAI’s AI writing function to automatically generate data analysis reports, adjusting the report format and word count through the flexible document creation function. If a team presentation is needed, the report can be converted into a PPT presentation with one click. PopAI will optimize the layout according to the audience’s characteristics and match relevant images to make the presentation more persuasive.
For students, Excel formulas are commonly used tools in statistical assignments and academic research, but complex formula logic is often a learning difficulty. PopAI’s AI Excel Formula Generator can help students quickly overcome this bottleneck: for example, when completing statistics homework, students only need to input ‘Calculate the median and standard deviation of the sample data’ and upload the data table to obtain accurate formulas. At the same time, through PopAI’s intelligent reading comprehension function, they can also get detailed explanations and application logic of the formulas, helping students understand the principles rather than rote memorization. In addition, students can use PopAI’s AI rewriter to optimize the analysis content related to the formulas to be more professional, improving the quality of homework or papers.
For professionals such as financial and audit personnel, the accuracy of Excel formulas is directly related to the reliability of work results. For example, financial personnel need to calculate the proportion of expenses in each department of the enterprise. By inputting ‘Calculate the proportion of each department’s administrative expenses to the total administrative expenses in 2023’ through PopAI, the system generates formulas that can be directly used for financial statement preparation. With PopAI’s automatic save management function, formulas and report data are saved in real time to avoid data loss. At the same time, through the one–click annotation function, the calculation logic of the formulas can be marked, facilitating subsequent audits or handovers and realizing systematic knowledge organization.
6. Current Challenges and Future Directions: How Can AI Excel Formula Generation Be Further Upgraded?
Although PopAI’s AI Excel Formula Generator has performed excellently, there are still areas for optimization in practical applications, which are also the core directions for future technological upgrades.
Currently, the main challenges of the tool focus on three aspects: first, limited adaptability to table structures. The table structures in the existing training data are relatively fixed, and the accuracy of formula generation needs to be improved for special forms such as horizontal tables and irregular tables. Second, insufficient coverage of formula functions. Currently, it mainly supports 37 commonly used functions, and the support for functions in scenarios such as string processing (such as FIND, LEN) and data cleaning (such as REPLACE, CONCATENATE) is insufficient. Third, input length limitations. Constrained by the model’s input character limit, for extra–large tables or ultra–long natural language queries, incomplete information encoding may occur.
To address these challenges, PopAI has formulated a clear upgrade plan: first, expand the coverage of the dataset to include more samples of horizontal, vertical, and irregular tables, enabling the model to adapt to more complex table structures. Second, supplement function samples for scenarios such as string processing and data cleaning, improve the formula function library, and meet more segmented needs. Finally, optimize the model’s input processing mechanism, combine its own AI video and image functions, explore visual input methods for table data, break through the character length limit, and allow users to upload tables in the form of screenshots or videos to further lower the usage threshold.
In addition, PopAI plans to explore multi–table linkage formula generation functions—supporting the generation of formulas based on data associations across multiple tables in the same Excel file, such as ‘Calculate the proportion of each product’s sales in Table A to the total sales in Table B.’ At the same time, it will deeply integrate formula generation with AI data analysis, not only generating formulas but also automatically interpreting calculation results and providing data insight suggestions, upgrading the tool from a ‘formula generator’ to an ‘intelligent data analysis assistant.’
Conclusion
Excel formula generation was once a ‘technical threshold’ that plagued countless users. However, relying on the support of the NL2FORMULA dataset, an advanced sequence–to–sequence model architecture, and the integration advantages of its own full–scenario AI ecosystem, PopAI’s AI Excel Formula Generator has successfully broken through this bottleneck. It allows ordinary users to quickly obtain accurate Excel formulas through natural language without mastering complex formula syntax. Combined with diverse functions such as AI chat, document management, and PPT generation, it realizes the full–process intelligence from data input and formula generation to result presentation.
In the future, with continuous technological upgrades, PopAI will further improve the functions of the AI Excel Formula Generator and expand application scenarios, making AI Excel technology benefit more users. Whether office workers, students, or professional practitioners, they can rely on PopAI’s intelligent tools to get rid of tedious formula writing, focus more energy on core work and study, and truly realize ’empowering efficiency with AI and serving people with technology.’

