I stored files inside of Minecraft, and here’s how it works

Files, at their essence, are simply collections of data neatly packed into a single container. Modern operating systems often rely on file extensions to discern the nature of these files, dictating how the data is organized for meaningful interpretation. However, the concept of a “file” is not particularly unique; one can save a JPG as a .zip file, and with a little determination, most software will open it without complaint.

With this understanding, the potential for data representation expands significantly. Having previously demonstrated how files could be embedded within Pokémon Emerald, I ventured further into the realm of Minecraft. The game’s virtually limitless world presents an intriguing opportunity: theoretically, any file could be stored within it, provided one knows how to interpret the data afterward. This endeavor, while meticulous, serves as an excellent illustration of data storage and retrieval techniques.

Setting the stage

Understanding data storage

My primary goal was to establish a straightforward method for representing data within Minecraft that could be easily accessed through legitimate means. After considering various complex ideas, such as utilizing the orientation of stripped wooden logs or employing item frames, I realized that the game’s 16 available wool colors were ideal. Each wool block can represent four bits of data, allowing for a full byte to be stored every two blocks.

At its core, a file consists of a sequence of bytes, which can be divided into smaller, manageable segments. This division is done in a lossless manner, ensuring that the original content remains intact as long as the segments are reassembled correctly. Armed with this knowledge, I created a mapping of hex digits to wool colors, enabling the reading and writing of data within the game. For instance, a 67-byte file requires 144 wool blocks, with some blocks reserved for padding to maintain an even structure.

Here’s a glimpse of the mapping I devised:

Hex digit Binary Wool colour Block ID (Java)
0 0000 White minecraft:white_wool
1 0001 Light Gray minecraft:lightgraywool
2 0010 Gray minecraft:gray_wool
3 0011 Black minecraft:black_wool
4 0100 Brown minecraft:brown_wool
5 0101 Red minecraft:red_wool
6 0110 Orange minecraft:orange_wool
7 0111 Yellow minecraft:yellow_wool
8 1000 Lime minecraft:lime_wool
9 1001 Green minecraft:green_wool
A 1010 Cyan minecraft:cyan_wool
B 1011 Light Blue minecraft:lightbluewool
C 1100 Blue minecraft:blue_wool
D 1101 Purple minecraft:purple_wool
E 1110 Magenta minecraft:magenta_wool
F 1111 Pink minecraft:pink_wool

For example, to encode the sequence 1111 0000 1010 0001, one would use:

  • Pink wool
  • White wool
  • Cyan wool
  • Light Gray wool

While the manual placement of blocks can be tedious, the overall process of encoding data in this manner is quite manageable. I also developed an encoder that generates a reference image to aid in constructing the data format.

Encoding data

Creating an mcfunction file

Encoding the data proved to be a relatively straightforward task, especially when compared to the extensive time spent on the initial file storage concept. The encoder operates via a terminal command, generating an “mcfunction” file—a script that executes all commands entered within it, allowing for the rapid placement of blocks without manual effort.

To invoke the encoder, one would run the following command, requiring the Pillow module:

python3 encoder.py hello.txt --cols 12 --y -60

This command specifies a maximum of 12 columns at a time and sets the Y level for testing in a superflat world. The resulting in-game structure is a matrix of wool blocks, with the possibility of varying widths based on the “cols” value.

Decoding files from Minecraft

A failed attempt at OCR, though reading world files works fine

Challenges arose during the decoding phase, particularly when I attempted to utilize image recognition to identify the blocks in a screenshot. Despite initial success with some methods, inconsistencies due to perspective changes and block lengths made this approach unreliable. Ultimately, I turned to Amulet, a Python library capable of reading directly from a world file, which provided a much more reliable solution.

While this method is effective, it does require more manual effort if one wishes to share files through Minecraft servers. The process involves taking a screenshot, reconstructing the wool matrix locally, and then decoding it, which is not the most user-friendly approach. However, the deterministic nature of reading individual blocks from the world file ensures accuracy.

When executing the decoder, users must define the coordinates of the wool matrix’s top left corner, choose the movement direction along the X and Z axes, and specify the matrix dimensions. Although this process is somewhat manual, it yields successful results.

  • Top left X
  • Top left Y
  • Top left Z
  • Dimension [overworld/nether/end] (default = overworld)
  • Width (cols)
  • Height (rows)
  • Column step dX dZ [1 0]
  • Row step dX dZ [0 1]
  • Padding (trailing white-wool blocks to ignore, 0 for none)

To run the script, one must include the –world flag:

python3 .decodefromworld.py --world '.New World'

Should the decoder encounter an unexpected block, it will raise an error, providing insight into necessary adjustments. Additionally, the output file, “decoded.bin,” must be renamed to match the expected format, as the data remains unchanged regardless of the file type.

Upon executing the decoder, the output reveals the encoded message, demonstrating the successful mapping of each wool block to its corresponding hex value:

Hi there, this is a test file to show encoding a file in Minecraft!

The hex values align perfectly with the wool blocks placed in the game, showcasing the effectiveness of this innovative data storage method.

Files can be represented by anything

It’s all about knowing how to retrieve it

This exploration has illustrated that files can be represented in myriad ways. Whether through a string of LEDs indicating binary states or a water bottle denoting varying levels of fullness, the key lies in defining a structure for data retrieval. This project serves as a testament to the creativity inherent in data representation, though it is not intended for practical use in its current form. For those interested in the underlying code, it is available on GitHub for further exploration.

AppWizard