Mahjongg Solitaire Solver


About Mahjongg Solitaire

Mahjongg Solitaire is a tile matching game written by Brodie Lockard in 1981, and it has become extremely popular since. Activision published it in 1986 under the name Shanghai® using tiles from a chinese four-player game called Mahjong. For this reason it is often also called Mahjongg Solitaire and sometimes just Mahjongg, which is indeed annoying for the players of the real Mahjong game for the confusion it causes. Other names I've seen around are: Taipei, Solitile, The Turtle and variants of the name Mahjongg such as Mah Jong, Mahjong, Majong, Mah-Jongg, all either with or without the “Solitaire” suffix. Since Shanghai® is a registered trademark by Activision and the game is even more widely known as Mahjongg Solitaire I'll be using this name. See also http://home.halden.net/vkp/vkp/history.html for more information about the history of the game. That site also holds lots of information which will be of interest to people who want to know more about it.

The idea of the game is as follows: the tiles are organized in a layout, usually with some of the tiles covering others (there can be several layers of tiles). A tile is considered to be blocked if either there is a tile covering it (even partially), or if there is at least one tile touching its left side and at least one tile touching its right side. If a tile is not blocked then it is said to be free. A move consists of removing two matching free tiles. Two tiles are said to match if they belong to the same group; in the most usual variants there are four tiles per group (always an even number, in any case) and 36 groups, i.e. 144 tiles in total. The object of the game can vary depending on the variant played as we'll see later, but in the case we're interested in the aim is to remove all tiles by making moves. A board (an arrangement of tiles in a layout) is solved if all tiles are removed. We use the term “position” to refer to the combination of tiles present or absent in the board after any given number of moves.

It's not always possible to remove all tiles; the difficulty is very influenced by the layout. Let's examine three extreme cases. One of them is a layout in which each tile is stacked on top of the previous one (144 layers in total with the 36-group 4-tiles-per-group convention); obviously there's just one single tile free at start and there's no possible first move no matter the arrangement. Another pathological case is all of the tiles placed in a single vertical column. In this case all tiles are free and the game is trivial. The last one is a horizontal row of tiles; for this layout most random arrangements will make the game unsolvable, and only the arrangements that are palindromic (which are extremely unlikely for uniformly distributed permutations) will be solvable but will also make the game trivial.

With some layouts, however, it's not so easy to determine whether a given board (a layout holding shuffled tiles in certain order) is solvable or not. There is a proof showing that this problem belongs to the NP-complete class in the general case. Now we know which kind of problem we're facing.

There are various ways to play the game: as a puzzle, against the clock, against another player, etc. either with or without a scoring system. Some variants' goal is to maximize the score, which often accounts for the time taken by removing tiles and it may happen that the player can obtain a higher score by playing quickly, even if the board is not cleared, than by clearing it. We're only interested in the puzzle variant: the player must find a way of removing all tiles until the board is solved. There is no limit to the number of retries, and the acquired experience usually plays a fundamental role for the success. Using undo or hint features is sometimes considered cheating; if you make a wrong move you have to start again.

The variant just described belongs to the class of games that are said to be “complete-information games”. There's a proof that there's no deterministic algorithm to solve the solitaire but the proof applies only to the incomplete-information variant and not to this one, i.e. it assumes that the information presented to the solving algorithm is only the tile faces that are visible to a player during a single game. In this incomplete-information version the choice of the best strategy is a problem whose complexity in the general case has been proved to be PSPACE-hard (the paper is in PostScript format). In a nutshell, PSPACE-hard means that it's believed to be much harder than a NP-complete problem. However we assume complete knowledge of the board's contents even for tiles totally covered, so these objections don't apply. After all, what we're after is a method to determine whether a board is solvable so that we know in advance that we're not going to play an unsovable board. Since the method is constructive, we get a solution as an extra bonus; if a board's solution escaped our ability we can see how it could be solved.

Some of the Mahjongg Solitaire playing programs have the option to generate boards that are guaranteed to be solvable. However in my own experience these boards tend to be easier to solve than boards coming from a random permutation of tiles and I consider them to be less interesting. There are other programs which generate biased shuffles.

This page is dedicated to the divulgation of an algorithm for determining whether a given board can be solved and give an explicit solution if so. We focus in the 36×4 variants but most of the principles explained here can be extended to others. However the complexity with this approach is dramatically increased with more than four tiles per group as we will see.

Algorithm preliminaries

After playing the game for a while there is something that becomes apparent: it is not actually important the order in which tiles are removed but rather the choice of pairings. I don't have a formal proof for this but it seems to be very clear and in fact the algorithm presented here works under the assumption that this conjecture is true. Well, at least I have an informal proof (you're advised to read at least the next three paragraphs before trying to read the proof).

In order to clarify things let me explain what I mean by the choice of pairings. We distinguish each individual tile within a group, even if the external aspect of the tiles of a group in the common variant is not distinguished except for the “seasons” and the “flowers”. Now in order to solve a board, what matters is which tile within each group is paired with which one in order to remove them, rather than whether tiles from a group are removed before or after tiles from a different group.

An example will help with clarification. We will use the notation group:tile to refer to a particular tile, e.g. 14:0 is tile 0 of group 14. As we deal with four tiles per group and 36 groups the tile numbers within a group will range from 0 to 3 and the group numbers will range from 0 to 35, so the whole set of tiles ranges from 0:0 to 35:3.

Let's consider a position in which there are six tiles free: 14:0, 14:1, 14:3, 18:0, 18:2 and 18:3. The conjecture states that in order to find a solution it's not important whether tiles from group 14 or group 18 are removed first, but which pairings are done when removing tiles, i.e. if we remove 14:0 and 14:1, 14:0 and 14:3, or 14:1 and 14:3, and similarly with group 18 no matter the order, i.e. no matter whether group 14 or 18 is removed first.

We need to define a more formal concept of pairing. There are three ways of grouping the four tiles of a group g into two pairs of two tiles each, namely:

Note: with 6 tiles per group the number of possible pairings goes up to 15, making the solving algorithm presented here impractical.

We use the convention of labeling the pairing for a group with a digit from 0 to 2 which corresponds to the tile that tile 3 is paired with. Under this convention, for example, pairing 1 refers either to tiles g:2 and g:0 or to tiles g:3 and g:1.

Now, under the assumption of the conjecture above, a solution to a certain board can be expressed by specifying 36 pairings one for each group, or more simply, by a 36-digit ternary code. We call such a ternary code a “pairing set” and adopt the convention that group 0 is in the rightmost position of the code and group 35 is in the leftmost, according to the weights of the digits, but this is only of interest for the interpretation of the solution as printed by the solving program.

A given pairing set may or may not be a solution to a board. Finding a pairing set that is a solution is the object of the algorithm presented here. The idea of pairings reduces the search space to a point where it starts being practically feasible to traverse the tree, even though with certain boards it will still take very long as we will see.

To apply a pairing set is to repeatedly look for and remove pairs of free tiles matching their corresponding pairing until it is not possible to find any pair of free tiles whose pairing is in the pairing set. Under the assumption of the conjecture, if a pairing set is a solution to the board then after applying it we should end up with an empty position. Here's a simple algorithm for applying a pairing set specified in the input array pairing[] which holds the pairing number for each group. For simplicity, the algorithm considers that the tiles that have been removed are not free.

  1. (Initialization) Let g = 0.
  2. (Remove tiles according to pairing)
    If pairing[g] = 0 and tiles g:0 and g:3 are both free,
     remove g:0 and g:3 and go to step 1.
    If pairing[g] = 0 and tiles g:1 and g:2 are both free,
     remove g:1 and g:2 and go to step 1.
    If pairing[g] = 1 and tiles g:1 and g:3 are both free,
     remove g:1 and g:3 and go to step 1.
    If pairing[g] = 1 and tiles g:0 and g:2 are both free,
     remove g:0 and g:2 and go to step 1.
    If pairing[g] = 2 and tiles g:2 and g:3 are both free,
     remove g:2 and g:3 and go to step 1.
    If pairing[g] = 2 and tiles g:0 and g:1 are both free,
     remove g:0 and g:1 and go to step 1.
  3. (Next group) Let g = g + 1.
    If g = 36 then the algorithm terminates;
     otherwise go to step 2.

We're after a concrete pairing set that is a solution to the board, but during the intermediate stages of the solver it's convenient to declare the pairings for some groups as undefined. This can be specified by a pairing number that is not 0, 1 or 2; for example 3. When applying a pairing set which has undefined pairings, the groups with undefined pairings should not be touched at all, thus the above algorithm remains valid. Obviously a pairing set with undefined pairings can't possibly be a solution.

The algorithm

Now it's time to explain how the algorithm works. It uses conventional tree traversal, but with a strategic choice of nodes an branches. Each node holds a position (information of tiles already present in the board) which is passed to the children and a list of groups which is local to the node. The initial position is the root node. The pairing set is global and is initialized to all unknown; we will refer to it as pairing[]. For each node the algorithm proceeds as follows:

  1. (Apply) Apply the current pairing set to the passed-down position. If all tiles were removed then the algorithm terminates, printing the current pairing set as a solution.
  2. (Build the group list) Build a local list of groups for which there is a pair of tiles that can be removed (independently of how they are paired) and have their corresponding pairing equal to unknown in the pairing set. We call glist[] the list and gcount the count of elements in the list. If the list is empty (gcount = 0) then this is a leaf node (dead end); return if so.
  3. (Set the initial values of the pairings) For each group in the list, set the corresponding group in the global pairing set to zero. That is, for j = 0 to gcount-1 set pairing[glist[j]] = 0.
  4. (Branch) Call this routine recursively, passing the current position to the child.
  5. (Prepare to skip to next ternary combination) Let j = 0.
  6. (Skip to the next ternary combination) If j >= gcount then go to step 7.
    If pairing[glist[j]] = 2 then set it to 0, set j = j+1 and go to step 6.
    Otherwise increase pairing[glist[j]] and go to step 4.
  7. (Clean up and return) For all groups in the list set the corresponding pairings in the global pairing set to unknown. That is, for j = 0 to gcount-1 set pairing[glist[j]] = 3. Return: no solution was found in the current branch.

Steps 5 and 6 perform an increment with carry, in order to try all possible variations of pairings for the groups in the list.

Step 1 implies a "sudden death", an early termination of the algorithm, in case a solution is found. If the algorithm returns (via step 2 or step 6) from the call to process the root node then there's no solution.

This algorithm has 3^(newly free pairs) branches per node. This means that with certain layouts with lots of free tiles and certain shuffles with lots of possible pairs which do not lead to a solution it will be horribly slow. However in practice with the usual layouts such as the Turtle (a.k.a. the Classic, the Dragon or the Pyramid) the solution is found in more or less reasonable time. With my machine (a 1600 MHz Athlon) it took about a week for the worst cases, and below an hour in the typical case.

Optimizations

Of course there's room for improvement. For example the algorithm is easily convertable to nonrecursive (iterative) using a stack; indeed I have written a nonrecursive implementation in Pascal. This facilitates the early stop in step 1 and can in some cases be faster.

For some layouts there are possible layout-specific enhancements too. The Turtle layout, for example, has a characteristic that can be used for simplification of the blocking test routine: all tiles except one are blocked at most by one tile to the left, at most by one tile to the right, and at most by one tile covering them. There is just one tile (in the center row, near the right edge) breaking this rule, as it is blocked by two tiles to its left and one to its right.

But the enhancements are not just implementation-specific. Given the amount of branches that have to be analyzed, a constant factor added to each node will surely pay off if it has enough impact in reducing the branching factor.

When all four tiles of a group are free in a given position it doesn't matter which tile is removed with which. That is an advantage that can be used; for example we can modify the solving algorithm so that we “freeze” the pairing to an arbitrary value (e.g. 0) instead of trying all variations. A similar situation happens when there is a pair of tiles such that, when removing it, the other pair of the same group can be immediately removed. In that case the pairing can be fixed to the one which makes this possible.

There is another case for which the pairing can be decided in advance. Consider the following situation: there are exactly three free tiles of a certain group, and one of them can be removed at any time and its presence or absence will have absolutely no influence on the others. In this case, removing the other two tiles will surely lead towards a solution and will never have a negative influence; if it were removed together with one of the others, the remaining one could be an obstacle later, but if it is not removed in that moment then it can't possibly be an obstacle.

Now how to determine the situations in which such a “loose tile” exists? Obviously a tile with no other tile covered by it nor to its left or right sides (a standalone tile) is a loose tile. The same happens when the “loose candidate” has an adjacent tile to its left or right side which in turn does not have any other tile to the other side. It doesn't matter if that tile has others covering it. There can also be two tiles with these characteristics as long as both are at the same side of the loose candidate and touching it.

A similar opportunity arises when there are at least two tiles free which when removed cause a third tile of the same group to become loose. The result of this all is similar to the case of being four tiles free, since as soon as the fourth tile appears it can be immediately removed.

Maybe the actual implementation of the analysis for these situations adds an important factor, or maybe not. The four-tiles-free case is by far the simplest to analyze/check. There's a possibility of keeping a list of free tiles which is updated with each removal, as a facility for the program.

It's also possible to make a prior analysis of the board in order to discard some relatively common cases where solvability can be determined in advance, most notably one tile which blocks another of the same group by standing on top of it, either directly or indirectly (with tiles from other groups in between), which in turn does the same to a third tile of that group. It can also happen that three tiles of a group are directly or indirectly covered by another. It's worth noting that one single tile can indirectly block many tiles in this way; consider for example a subgroup of tiles arranged forming a pyramid in which each tile covers 4 tiles except for the tiles in the bottommost layer. This kind of analysis is almost instant and rejects an important part of the total amount of unsolvable boards. In the Classic layout, for example, the topmost tile blocks 4 stacks of 4 tiles each, and it's not uncommon that there are two more tiles of that group in the same stack or even three in the four stacks.

There are other cases where this analysis could help; for example the case in which all of the tiles from two different groups are on a single row and the leftmost and the rightmost ones belong to different groups; however that is not so common as to be worth being checked.

As a final note, the characteristics of the algorithm make it suitable for distributed solving: each CPU can try a different branch of the tree, since the amount of information needed to initialize each branch is very low.

Code

This program is an implementation of the algorithm in C (following GNU style as closely as I could) with the purpose of illustrating how it works:

Use this address to send me E-mail: parigalo@formauri.es.
NOTE: THIS ADDRESS IS NOT PERMANENT. Always verify this page to send me E-mail because it's likely to change often for spam reasons.

Any kind of suggestions are welcome.

Back


© Copyright 1998,1999,2000,2002,2003,2004,2005 Pedro Gimeno Fortea. All rights reserved.
This is valid XHTML 1.0.