Predicting the Next Multimodal Token with Emu3, and Python relative import issue.

5 min readOct 2, 2024

I recently came across BAAI’s Emu3, an exciting model that aims to unify multimodal understanding and generation within a single architecture. It has the potential to revolutionize how we approach tasks involving text, images, and videos. In this article, I’ll share my experience exploring the Emu3 model, its setup, and how I overcame a common error encountered when using it.

Discovering Emu3

The Emu3 model immediately caught my attention due to its unique design based on the Transformer architecture. It’s built to handle multiple modalities — such as text, images, and videos — using a single model. Although the ultimate goal is to generate every modality using a single set of weights, the current implementation still requires separate sets of model weights for different tasks, which are planned to be unified in future updates.

Unfortunately, the model weights for video generation haven’t been open-sourced yet, but we can still explore the capabilities of the Gen and Chat models. The Chat model handles text generation, while the Gen model focuses on image generation.

Gen model: https://huggingface.co/BAAI/Emu3-Gen

Chat model: https://huggingface.co/BAAI/Emu3-Chat

Encountering a problem

Eager to try out Emu3, I started with the code snippet provided in the official repository. However, I quickly ran into an issue with the following lines:

import sys
sys.path.append(PATH_TO_BAAI_Emu3-Gen_MODEL)
from processing_emu3 import Emu3Processor

Python threw a “NameError: name ‘PATH_TO_BAAI_Emu3-Gen_MODEL’ is not defined,” indicating that the PATH_TO_BAAI_Emu3-Gen_MODEL constant was not defined. To find the model path, I manually downloaded the model using the Hugging Face CLI command:

huggingface-cli download --resume-download BAAI/Emu3-Chat

The model was downloaded to the following path as shown in the log

/root/.cache/huggingface/hub/models-BAAI-Emu3-Gen/snapshots/a9da517b1a51b160360947020c4c24900529495f

However, simply replacing PATH_TO_BAAI_Emu3-Gen_MODEL with the downloaded path didn’t solve the problem. I encountered a new error:

ImportError: attempted relative import with no known parent package

The error message suggests two things:

Relative Imports in the Emu3 Code: The Emu3 codebase uses relative imports — for example, in the model repository, modeling_emu3.py imports configuration_emu3.py from the same folder.
Unknown Parent Package: Python couldn’t identify the parent package, making it impossible to resolve the relative imports.

Understanding Python’s Import Mechanism

To grasp why this error occurred, I revisited how Python handles imports. When Python encounters an import statement, it follows these steps:

Module Cache Check: Python checks if the module is already in sys.modules. If found, it uses the cached module.
Searching sys.path: If not cached, Python searches for the module in the directories listed in sys.path.
Module Loading: Upon finding the module, Python loads it and creates a new module object.
Resolving Relative Imports: If the module contains relative imports (e.g., from .util import ...), Python resolves them based on the module's __package__ attribute.
Handling Import Errors: If Python can’t resolve a relative import due to an unknown parent package, it raises an ImportError.

While the CPython implementation can be complex and challenging to interpret, the importlib module offers a clearer example of how relative imports are handled. Although the error messages may differ slightly, the underlying logic remains consistent and aids our understanding. The relevant code for handling relative imports can be found in the importlib.__init__.py file:

https://github.com/python/cpython/blob/2c050d4bc28bffd2990b5a0bd03fb6fc56b13656/Lib/importlib/__init__.py#L71

def import_module(name, package=None):
    """Import a module.
    The 'package' argument is required when performing a relative import. It
    specifies the package to use as the anchor point from which to resolve the
    relative import to an absolute import.
    """
    level = 0
    if name.startswith('.'):
        if not package:
            raise TypeError("the 'package' argument is required to perform a "
                            f"relative import for {name!r}")
        for character in name:
            if character != '.':
                break
            level += 1
    return _bootstrap._gcd_import(name[level:], package, level)

The Role of `package`

The __package__ attribute is crucial for resolving relative imports. It defines the package context for a module. When a module is imported as part of a package (e.g., import my_package.my_module), __package__ is set to 'my_package'. If a module is run directly or imported without a package context, __package__ is set to None.

In our case, since we were adding the model directory directly to sys.path and importing modules from there, Python treated the modules as top-level, standalone modules. Consequently, __package__ was None, causing relative imports within those modules to fail.

from processing_emu3 import Emu3Processor

Solving the Problem

To resolve the import issue, I needed to ensure that Python recognized the modules as part of a package, such as snapshot.processing_emu3. This involved organizing the modules within a proper package structure.

Creating a Symbolic Link

First, I created a symbolic link (symlink) to the snapshot directory in my current working directory:

ln -s /root/.cache/huggingface/hub/models--BAAI--Emu3-Gen/snapshots/a9da517b1a51b160360947020c4c24900529495f snapshot

This allowed me to reference the model files relative to my current directory.

Modifying `sys.path` and Importing Correctly

Next, I modified my script to adjust sys.path and import the Emu3Processor correctly:

import sys
sys.path.append(".")
from snapshot.processing_emu3 import Emu3Processor

By appending "." (the current directory) to sys.path, I ensured that Python would search the current directory for modules. Importing Emu3Processor from snapshot.processing_emu3 preserved the package structure, allowing relative imports within processing_emu3.py to resolve correctly.

With this setup in place, the rest of the code runs smoothly. It’s worth noting that the gen model takes some time to generate an image, so patience is key.

Result

This is the 720x720 image that I generated — not bad, right?

That said, the model doesn’t support interleaving text and image ingeneration, which would be a fantastic feature. Hopefully, this will be added in a future release!

— — —

PS: You can try to generate image with the chat model and generate text with the gen model and see what you got as a fun exercise :) Leave me a comment below for what you got :)

Appendix: Reproduction of the import issue

To better understand the issue, let’s replicate it with a simple setup:

my_module/
├── lib.py
└── util.py
user/
├── main.py
└── test.ipynb
wrong_usage/
└── main.py

In this structure, the my_module package contains two modules: lib.py and util.py. The lib.py module uses a relative import to access functions from util.py:

from .util import add, subtract, multiply

Now, let’s consider two scenarios:

Running python user/main.py:

import sys
sys.path.append("..")

print(sys.path)
import my_module.lib

print(my_module.lib.add(1, 2))
print(my_module.lib.subtract(1, 2))
print(my_module.lib.multiply(1, 2))

This script runs successfully because it imports my_module.lib using an absolute import, and the parent directory is added to sys.path.

Running python wrong_usage/main.py:

import sys
sys.path.append("../my_module")
print(sys.path)

import lib
print(lib.add(1, 2))
print(lib.subtract(1, 2))
print(lib.multiply(1, 2))

This script raises the “ImportError: attempted relative import with no known parent package” because it tries to import lib directly as a top-level module, i.e. a standalone module, without specifying the parent package.

This is the same regardless of running it as

python main.py
python -m main
Or running it inside a jupyter notebook

That said, the model doesn’t support generation for interleaving text and image otherwise it will be super cool. Hopefully this will get added in the new release!

— — —

PS: You can try to generate image with the chat model and generate text with the gen model and see what you got as a fun exercise :) Leave me a comment below for what you got :)

(Co-written by Claude Opus + ChatGPT 4o)

Predicting the Next Multimodal Token with Emu3, and Python relative import issue.

Discovering Emu3

Encountering a problem

Understanding Python’s Import Mechanism

The Role of `package`

Solving the Problem

Creating a Symbolic Link

Modifying `sys.path` and Importing Correctly

Result

Appendix: Reproduction of the import issue

Written by Xianbo QIAN

No responses yet

Predicting the Next Multimodal Token with Emu3, and Python relative import issue.

Discovering Emu3

Encountering a problem

Understanding Python’s Import Mechanism

The Role of __package__

Solving the Problem

Creating a Symbolic Link

Modifying sys.path and Importing Correctly

Result

Appendix: Reproduction of the import issue

Written by Xianbo QIAN

No responses yet

The Role of `package`

Modifying `sys.path` and Importing Correctly