Microsoft Researchers Are Teaching AI to Read Spreadsheets

It can be difficult to make a generative AI model understand a spreadsheet. In order to try to solve this problem, Microsoft researchers published a paper on July 12 on Arxiv describing SpreadsheetLLM, an encoding framework to enable large language models to “read” spreadsheets.

SpreadsheetLLM could “transform spreadsheet data management and analysis, paving the way for more intelligent and efficient user interactions,” the researchers wrote.

One advantage of SpreadsheetLLM for business would be to use formulas in spreadsheets without learning how to use them by asking questions of the AI model in natural language.

Why are spreadsheets a challenge for LLMs?

Spreadsheets are a challenge for LLMs for several reasons.

Spreadsheets can be very large, exceeding the number of characters a LLM can digest at one time.
Spreadsheets are “two-dimensional layouts and structures,” as the report puts it, as opposed to the “linear and sequential input” LLMs work well with.
LLMs aren’t usually trained to interpret cell addresses and specific spreadsheet formats.

Microsoft researchers used multiple-step technique to parse spreadsheets

There are two main parts of SpreadsheetLLM:

SheetCompressor, which is a framework to shrink spreadsheets down into formats LLMs can understand.
Chain of Spreadsheet, which is a methodology for teaching a LLM how to identify the right parts of a compressed spreadsheet to “look at” when presented with a question and for generating a response.

A diagram of how the SpreadsheetLLM framework “reads” a spreadsheet by performing multiple processes. Image: Microsoft

SheetCompressor has three modules:

Structural anchors that help LLMs identify the rows and columns in the spreadsheet.
A method for reducing the number of tokens it costs for the LLM to interpret the spreadsheet.
A technique for improving efficiency by clustering similar cells together.

Using these modules, the team reduced the tokens needed for spreadsheet encoding by 96%. This, in turn, enabled a slight (12.3%) improvement over another leading research team’s work into helping LLMs understand spreadsheets. The researchers tried their spreadsheet identification method with these LLMs:

OpenAI’s GPT-4 and GPT-3.5.
Meta’s Llama 2 and Llama 3.
Microsoft’s Phi-3.
Mistral AI’s Mistral-v2.

For the Chain of Spreadsheet capabilities, they used GPT-4.

What does SpreadsheetLLM mean for Microsoft’s AI efforts?

The obvious advantage for Microsoft here is in enabling its AI assistant Copilot, which works in many Microsoft 365 suite applications, to do more in Excel. SpreadsheetLLM represents the ongoing effort to make generative AI practical – and opening up Excel to people who haven’t been trained on its more advanced features might be a good niche for generative AI to expand into.

SEE: How deeply your business engages with Microsoft Copilot will affect which – if any – version is right for your work.

Real-world usage and next steps for this Microsoft research

A 12.3% improvement over a previous, leading research team’s findings is more academically significant than economically significant for now. Generative AI is infamous for making things up, and hallucinations cascading through a spreadsheet could render huge swaths of data useless. As the researchers point out, getting an LLM to understand a spreadsheet’s format – that is, what a spreadsheet usually looks like and how it functions – is different from getting the LLM to generate comprehensible, accurate data inside those cells.

In addition, this methodology takes a lot of computing power and multiple passes through a LLM to generate an answer. Plus, your office’s Excel wizard might be able to pull an answer in a few minutes without using nearly as much energy.

Going forward, the research team wants to include a way to encode details like the background color of cells and to deepen the LLMs’ understanding of how words within the cells relate to one another.

TechRepublic has reached out to Microsoft for more information.

https://assets.techrepublic.com/uploads/2024/07/spreadsheetllm-jul-24.jpg

Source link
Megan Crouse

The Oppo Find X9 Ultra could be the world’s best camera phone — and it’s launching globally this month

Why volumetric video works for the Olympics – but not yet for cinema

Why you shouldn’t ask ChatGPT for relationship advice — it’ll just tell you you’re right and ‘may worsen rather than resolve conflict’

The Oppo Find X9 Ultra could be the world’s best camera phone — and it’s launching globally this month

Ex-Israeli Intelligence Official: Shockwaves of Trump’s “Take Over Gaza” Heard, Felt Across Region

What UK political parties are promising in the 2019 general election

Otto Warmbier’s parents want North Korea to suffer for their son’s death

Could a ‘youthquake’ cause Boris Johnson to lose the general election?

Channel 4 Acquires ‘The Copenhagen Test’ for U.K.

Two and a Half Men Cast, Charlie Sheen, Angus T. Jones: Where Are They Now?

Ask Lisi: Celebrity podcasts get failing grade

GTA 6 Official Release Date Update Is Sending Gamers Wild

Cal-Maine Foods beats profit forecasts despite revenue miss, shares rise 4%

Trump has no good options in Iran—here are 5 of them ahead of his speech to the nation tonight

Form 13D/A Real Messenger Corporation For: 1 April

Trump has no good options in Iran—here are 5 of them ahead of his speech to the nation tonight

The YouTuber who has become one of Gen Z’s most beloved celebrities

26 last-minute holiday gifts that are still thoughtful and unique

Practicing gratitude regularly can make you less stressed and sleep better

8 things millennials wish you would just stop getting them for the holidays

Microsoft Researchers Are Teaching AI to Read Spreadsheets

Why are spreadsheets a challenge for LLMs?

Microsoft researchers used multiple-step technique to parse spreadsheets

What does SpreadsheetLLM mean for Microsoft’s AI efforts?

Real-world usage and next steps for this Microsoft research

Cal-Maine Foods beats profit forecasts despite revenue miss, shares rise 4%

The Oppo Find X9 Ultra could be the world’s best camera phone — and it’s launching globally this month

Channel 4 Acquires ‘The Copenhagen Test’ for U.K.

Trump has no good options in Iran—here are 5 of them ahead of his speech to the nation tonight