How and when to use Python frozenset() with examples

A lot of intermediate Python programmers don’t even know this data type exists

Jeffrey Boschman
Towards Dev

--

Overview (tldr)

Frozenset() is a Python built-in function that takes an iterable object (e.g. tuple, list, set, dict, etc) as an input, and returns an immutable (i.e. “frozen”) set. For dictionaries, only the keys are part of the immutable set. It is often used so you can use the frozenset object as a key for a dictionary or as elements of another set (for example, to find anagrams using a hashmap).

What is a frozenset?

Frozenset() is a function built into Python. That you means you don’t have to import any package or module to use it.

It takes an iterable object (like a tuple, list, set, dict, etc), and makes an immutable (i.e. it freezes them so you can’t change the elements in them) set. So essentially a frozenset type is … a set … that is frozen (makes sense, right?). It is important to note that for dictionaries, a frozenset will only retain the keys (however, there is a trick to get around this. See the “Example finding anagrams using frozensets and a hash map” below).

Output:

In the above example, you can see how only the keys {1, 2, 3} from the dict {1:’a’, 2:’b’, 3:’c’} remain in the frozenset. You can also see how you can’t reassign the elements of a frozenset (because it is immutable).

However, there are a few built-in frozenset methods so you can do some operations with them, like finding the intersection, union, difference, or symmetric_difference of two frozensets.

These are essentially the same methods as for regular Python set objects

Why use frozenset?

So now you may ask: why would anyone use a frozenset instead of a regular set? Well, a frozenset is often used so you can use the frozenset object as a key for a dict or as elements of another set. Here’s an example:

Example finding anagrams using frozensets and a hash map

Say you have a list of strings and you want to find the first pair of anagrams (an anagram is a word, phrase, or name formed by rearranging the letters of another, such as spar, formed from rasp) or if a pair exists at all.

In this example, ‘cat’ and ‘tac’ are anagrams.

One way to do this would be compare each element of the list to every other element of the list and see if they are anagrams. The problem with this method is that it would be O(n²) time complexity because you have to iterate over the entire list for each element of the list (in worst case, the entire list).

There is a faster way of doing doing this task in linear (O(n)) time with hashtables. You can iterate over the list just once, comparing each element to the keys of a hashtable (which has O(1) lookup time) that contains previously seen elements.

Now, we just need a way to represent each string in a general form that would be the same for all other anagrams of that string. One way to do this is to use the count the occurrences of each letter in a string.

Python’s collections.Counter (a dictionary subclass) is perfect for this. A Counter object is a collection where elements are stored as “dictionary” keys and their counts are stored as “dictionary” values. If you apply Counter on a string, the letters will be the keys and their counts will be the values.

You can see that Counter objects are essentially dict objects with a wrapper around them.

So let’s just add the Counter object (or dict) of the string letter counts to a hashtable (a set in these examples). But wait, we can’t!

We can see that Counter objects can’t be used as keys in a hash table

(or explicitely with dictionaries)

The same problem with dict objects (from now on, I won’t show the conversion to dict, as it is apparent that Counter is a subclass of dict)

This is where frozenset comes in.

But wait (again), you can’t use frozenset on dict objects, because it will only retain the keys (the letters).

Only the keys (the letters) remain, but we also need the counts of each letter. Otherwise it would say that ‘cat’ and ‘caat’ are anagrams (they are not) just because they have the same letters

So we need to use a trick to convert each key, value pair of the dictionary to tuples. We can use the .items() method for this.

Now we have frozensets that contain the letters and their counts for each string

Now we can each add each frozenset to a hash table.

Now each frozenset object is an element of the set (a hashtable in Python)

(Bonus just to confirm that you need frozenset even with the tuples of the Counter object items)

Same problem as with dict or Counter objects — that’s why we need to use frozenset!

Final code example to find the first anagram in a list of strings

So now we can use this general form (frozenset of Counter items tuples) as the keys of the hashtable.

We can iterate over the list of strings once, and for each string we can just check if it’s general form exists as a key in the hashtable (which means the current string is an anagram of a previously seen string). Then we can print the current string and the previous string that are anagrams.

Otherwise, we just add the string’s general form (and it’s index so we can find it later) to the hashtable, and move onto the next string in the list.

Example with anagrams

Example with no anagrams

If there are no anagrams in the list, then the return statement in the for loop will never execute and we will get to the end of the function.

Conclusion

I hope this post helped you understand Python frozensets and how to use them more.

If you liked it, please consider following me to get updates on all my new posts about Python tricks and machine learning (specifically, deep learning in medicine).

If you are not a Medium member and you would like to get unlimited access to my posts, please consider joining Medium with my referral link (disclaimer: part of the monthly fee will directly support me).

--

--

An endlessly curious grad student trying to build and share knowledge.