TL;DR
A team of researchers has developed an AI model that encapsulates the knowledge of human cooking in only 2MB of data. This model is trained on over 4 million recipes across multiple languages, representing a significant advance in culinary AI. The development raises questions about the potential and limitations of data compression in culinary knowledge.
Researchers have developed a new AI model that compresses the collective knowledge of human cooking into just 2 megabytes of data, based on a multilingual recipe corpus. This breakthrough demonstrates the potential for highly efficient encoding of complex, culturally diverse culinary information, with implications for AI-driven cooking assistance and culinary research.
The model, named Epicure, is a family of three sibling skip-gram ingredient embeddings trained from scratch on a dataset of over 4.14 million recipes sourced from 11 different platforms in seven languages, including English, Chinese, Russian, and others. The raw ingredient strings were normalized to 1,790 canonical entries using a large language model (LLM)-augmented pipeline. The resulting data includes a comprehensive ingredient-ingredient co-occurrence graph with over 200,000 edges, and a flavor-ingredient compound graph with over 80,000 edges, spanning 2,247 compound nodes across 15 categories.
Using these data, the researchers trained three variants of a Metapath2Vec model that differ in their focus: one on co-occurrence data, one on chemical compound relationships, and a hybrid approach blending both. The models aim to capture different aspects of culinary knowledge, from recipe context to chemical composition, within a remarkably small data footprint.
Why It Matters
This development matters because it showcases the potential for highly compressed AI models to encode complex, culturally diverse knowledge, in this case, human cooking. Such models could enable lightweight culinary AI tools that operate efficiently on limited hardware, expand access to cooking knowledge globally, and advance research in AI understanding of cultural practices. It also raises questions about the limits of data compression for complex human activities.

Amazon Echo Spot (newest model), Great for nightstands, offices and kitchens, Smart alarm clock, Designed for Alexa+, Glacier White
MEET ECHO SPOT – A sleek smart alarm clock with Alexa and big vibrant sound. Ready to help…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
The project builds on recent advances in large language models and knowledge graph embedding techniques, applying them to the domain of culinary data. Prior efforts in AI cooking assistants have relied on larger datasets and more resource-intensive models. This research demonstrates a new approach by normalizing and embedding extensive recipes into a compact form, enabling potential applications in mobile devices or embedded systems. The work is part of broader efforts to encode human knowledge efficiently using AI.
“Compressing the essence of human cooking into just 2MB demonstrates the power of modern AI techniques to encode complex cultural knowledge efficiently.”
— Josef Liyanjun Chen, lead researcher
multilingual recipe app
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It is not yet clear how well this compressed model performs in practical cooking scenarios or whether it can generate accurate, culturally appropriate recipes. The research is still in the experimental stage, and real-world testing remains to be done. Further details on the model’s capabilities and limitations are expected in upcoming publications.

KitchenAid KQ914BA Digital Glass Top Kitchen and Food Scale, 11 lb Capacity, Measures Liquid & Dry Ingredients, Rotating Knob, Precise Results for Baking, Cooking, Keto and Meal Prep, Tare, Black
MAX CAPACITY: Dry ingredients: 11 LB. X 0.1 Oz and 5000 grams x 1 gram Liquid ingredients: 5000…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
Next steps include evaluating the model’s performance in real-world cooking tasks, exploring its integration into culinary AI tools, and expanding the dataset to include more diverse cuisines. Researchers may also work on refining the embeddings and testing their utility in various applications such as recipe generation, dietary customization, and culinary education.

Kitchen Gadgets Set 5 Pieces, Stackable Cooking Tools Space Saving for Small Kitchen, Accessories for RV Camper Inside Dishwasher Safe (Blue)
[COMPLETE SMALL KITCHEN TOOLS SET]: Set includes 1 grater for cheeses, vegetables, chocolate; 1 grinder for garlic or…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
How can a 2MB model encode all human cooking knowledge?
The model uses advanced embedding techniques and a normalized, multilingual recipe database to compactly represent ingredients, flavors, and their relationships, enabling a rich understanding within a small data footprint.
Will this model be able to generate new recipes?
It is still under development, but the primary goal is to encode knowledge rather than generate recipes. Future iterations may incorporate generative capabilities based on the embeddings.
What are the limitations of such a compressed model?
Its accuracy and cultural relevance in practical cooking scenarios are still untested, and it may lack the nuance of human culinary expertise, especially for complex or highly regional dishes.
Could this technology replace traditional recipe databases?
Likely not entirely, but it could complement existing systems by providing a lightweight, efficient way to access culinary knowledge, especially on devices with limited storage.
Source: Hacker News