Automatic Emoji Suggestions for a Tiny Private Lists App#

I have a small lists webapp that I use privately with my partner. It is just where we keep recurring lists, shopping notes, reminders, and shared household things.

At some point I noticed that lists feel nicer when items have emoji. tomatoes looks better with 🍅 next to it, coffee looks better with ☕, and train tickets looks better with 🚆 or 🎟️, etc.

The annoying part was picking the emoji manually every time.

This felt like a good excuse to build something with AI. I had wanted to do a small AI feature for a while, and emoji suggestions sounded perfect: take a short bit of text, understand what it means, return a matching symbol. My first thought was embeddings. Encoding all emojis with a model, then encoding the input text of each new row, then using the vector to perform similarity search. At first, this sounded great…

Embeddings pipeline#

I looked up models I could run on my 3070 and found bartowski/Meta-Llama-3.1-8B-Instruct-GGUF on huggingface. It was very easy to get it up and running with the transformers lib in Python.

One of my first intuitions was that emojis by themselves would not be enough to capture the whole semantical depth of each emoji. So I needed a description of each emoji. I scraped emoji metadata and descriptions from Emojipedia, then tried matching list item text against those descriptions with multilingual sentence embeddings.

Offline pipeline: scrape Emojipedia, emoji data, encode name and description, store embedding.

That worked in the technical sense. I could type a query, embed it, compare it to emoji description embeddings, and get the nearest matches.

Runtime pipeline: user query, encode vector, vector search.

But the results were not good enough for the feature I wanted, and the worst part is actually having to have the compute available. I host my webapp on cloudflare workers for free, but I don’t have GPU infra available to run models, and it feels desperately over-kill to use bigger models for this. Latency should be near-zero to assign a correct emoji. Luckily, I knew that the vector search part would be easy, because there are only ~4k emojis. An in-memory data structure suffices.

There are many possible issues. Because we use multiple languages, some words can be confused: “pan” can be the cooking tool or bread in spanish. So there should be an aspect of personalization that allows me to “correct” any kind of mismatch. Additionally, sometimes we list things that are hard to assign an emoji to. Movie lists, for example. In this case, embeddings sound like a good option, as we can leverage the encoder model’s intelligence to recognize different aspects of a film and encode that. For example, “Pulp Fiction” could be close to any kind of emoji resembling violence or cigarretes. The inputs in a lists app can be tiny: milk, tomatoes, tickets, soap, eggs. These kinds of words usually have a perfect emoji assignable, although not all. I was able to one-shot a working test app for this feature in a single prompt. However, it didn’t perform as I had expected. My main test was the word “medialuna”, which is the Argentinian equivalent of a croissant. It was getting matched to moon emojis (luna = moon in spanish). It was also impossible to improve the matching: I could not teach the model that the best possible emoji for a medialuna is the croissant emoji. I stopped and realised that for the runtime cost trade-off of using a small language model, I was not going to get the accuracy I wanted.

Turning Emoji Descriptions Into Keywords#

I still used an LLM, just not at runtime. I had a much better idea! I thought I could still use the model’s intelligence to generate possible keywords for emojis. Then I could use these keywords with some kind of matching algorithm to match the query to an emoji.

I took the scraped Emojipedia data and used a small local Llama model to generate weighted keywords for each emoji in three languages we use: English, Spanish, and German.

The output became a table like this:

emoji,emoji_name,emoji_slug,language,word,weight
🥐,Croissant,croissant,en,croissant,1.0
🥐,Croissant,croissant,en,breakfast,0.55
🥐,Croissant,croissant,en,pastry,0.72
🥐,Croissant,croissant,es,medialuna,0.98
🥐,Croissant,croissant,es,croissant,0.85
🥐,Croissant,croissant,es,desayuno,0.5
🥐,Croissant,croissant,de,croissant,0.92
🥐,Croissant,croissant,de,frühstück,0.52
🥐,Croissant,croissant,de,gebäck,0.80

So instead of asking a model at runtime, we use a generated multilingual keyword dataset.

Runtime Matching#

At runtime the app stores the base data in a D1 table called emoji_keywords. When suggestions are needed, it loads the selected language into an in-memory model:

  • emojiWordWeights: emoji to keyword weights
  • emojiTotals: total keyword weight per emoji
  • tokenIndex: input token to candidate emojis
  • basePairs: known token and emoji pairs from the generated dataset

The input text is normalized by lowercasing, removing punctuation, preserving Spanish and German characters, and doing a naive plural strip.

The lookup path is boring, which is exactly why I like it:

Lookup path: list item text, tokens, token index, candidate emojis, scores, top result.

Scoring#

I remembered a few things from my statistics courses, fortunately.

Weighted Jaccard#

Jaccard similarity is a way to compare two sets by asking: how much do they overlap compared to everything they contain?

In the normal unweighted version, this is:

\[ J(A, B) = \frac{|A \cap B|}{|A \cup B|} \]

For this feature, the emoji keywords are weighted, so a very relevant word like tomato can count more than a weaker association like food.

For each candidate emoji, the app computes:

\[ intersection = \sum matched\ keyword\ weights \]\[ \begin{aligned} \text{union} = {}& \text{total emoji keyword weight} \\ &{}+ \text{input token count} \\ &{}- \text{intersection} \end{aligned} \]\[ score = \frac{intersection}{union} \]

This rewards emojis that match the input, but penalizes overly broad emojis with lots of keywords. Without that penalty, generic emojis can win too often just because they have many possible words attached to them.

Max Overlap#

The second method is simpler:

\[ score = \sum matched\ keyword\ weights \]

This asks: which emoji had the strongest direct keyword hits?

I set the top weighted-Jaccard result to be displayed as the first option, and max-overlap as the second option.

Fallbacks and Feedback#

The dataset contains English, Spanish, and German keywords, but there is no need for me to pick a language. At runtime it just tokenizes the item text and tries to match against the configured keyword tables.

If nothing matches, nothing dramatic happens. The item just shows the fallback character “-” and we can open the emoji picker manually.

When an emoji is selected by hand, the app records an append-only feedback event with the original text, tokens, shown suggestions, and chosen emoji.

It also keeps override weights for token and emoji pairs. The idea is:

  • manually chosen emoji are the strongest signal
  • a manual token-to-emoji match gets treated as 1.0
  • generated data is useful, but our private feedback is better
  • rejected suggestions can be slightly demoted

For example, if the generated dataset thinks pan should map to 🍳, but in our actual lists we always choose 🍞, the system should learn that.

Also, we can change emojis at any time. Keeping track of timestamps is key, because I can expect that the most recent emoji assigned manually is the one that we actually wanted.

Final thoughts#

The whole model is loaded into memory per language and cached. At this scale, this is perfect; single digit milliseconds for the queries (I know, could be much faster, but for a serverless database, I’m happy).

The dataset is small. The app is private. The input text is tiny. There are no runtime model calls, no embeddings service, and no network dependency just to decide whether coffee should become ☕.

That is the part I like most about this feature. I started with the more “AI” solution, found that it was not the right fit, and ended up using AI offline to create a small deterministic dataset.

The last few months of using the lists app were great. I really like having the visual signal of emojis on all lists, it makes them much easier to scan. My main takeaway from this little project was that leveraging small open source LMs to generate useful, “domain-specific” text data can yield good products.