The Truth About Magento and Entity-Attribute-Value
Magento is a serious e-commerce solution, built from the ground up for the kind of business logic that etailers use day in and day out. While many e-commerce platforms are modifications of other web platforms (often blogs or publishing software) to make them apply more to products than articles, Magento is structured as an infinitely flexible shopping cart and product management system.
One of the key factors that make it such an adaptable solution is its EAV (“Entity-Attribute-Value“) database structure that allows product managers — without any knowledge of programming or database design — to create entirely new data structures and handle changing product needs.
In this article, we’ll take a look at what EAV is, how it works both in general and within the Magento platform, and why you should (and shouldn’t) use it.
Designing any good website or business application involves making it future-proof — that is, ensuring that the app can grow with the changing needs of its users and owners alike. Often additional functionality will be implemented later once more is known about how people are using the site and what sort of things are important to them.
Key to making this expansion easy is having a database structure that can accommodate change and is not locked into a restrictive definition. Nowhere is this truer than with e-commerce sites, where a store that is one day selling only T-shirts may, months later, offer DVDs, books, and music.
These are very different sets of data to deal with — while customers shopping for T-shirts may primarily be interested in being able to look through a catalog by color, size, and style, consumers of movies want to know about genre, rating, format, directors, actors, and so forth. We quickly run into a problem where the sets of information we want to store for products — which shirts, movies, and books all are — are completely disjointed depending on the type of product.
One solution is to have every product definition contain all of the attributes possible for any type of product, and only displaying data that is relevant to the particular product type being accessed. So a book product would simply store nothing for its director, color, size, or actors. While this will certainly work, it requires being able to foresee every possible product attribute right from the beginning, and becomes a tremendous waste of database resources when 100,000+ products are storing empty data for dozens of attributes that don’t apply to them.
Another solution is treating these different types of products as totally separate entities. Books can be stored in a book table, shirts in a shirt table, and movies in a movie table. This ensures that each product type only has data relevant to what it is.
Unfortunately, this flies in the face of maintaining a logical structure where similar entities are grouped; after all, clothes, books, and movies are still all products, and share attributes like price, category, sale price, quantity, etc. This can be solved with traditional database normalization, but a software engineer is still needed to implement any new product types.
How does EAV work?
People new to EAV may initially be quite confused. Looking at a row of data for a product in what one would assume to be a products table (in Magento, “catalog_product_entity“), you’ll find nothing about price, color, or even name or description. Instead all you see is a set of IDs, a SKU, and created and updated dates. Of course, the data is there…it’s just not in the table you’d expect. Let’s take a step back:
” in EAV stands for Entity. This is an item you’re storing information about — such as a product, a customer, a category, or an order — that normally would make up your database tables.
” stands for Attribute. An attribute is an abstracted version of what a traditional DB structure would see as a column (i.e., a piece of information that applies to an entity). This would be things like price, size, quantity, etc.
” stands for Value. The value is, quite simply, the value of any particular attribute.
A traditional database schema has all three of these represented in one table. The entity is the row (uniquely identified by a primary key), the attributes are the columns of the table, and the values are the actual data stored in each row. The key to EAV is that each of these are separated out into different tables — each quite generic.
For instance, Magento’s attribute table (“eav_attribute“), stores information about what entity type the attribute applies to, what code it is uniquely identified by, how the data is input/output and modeled in the front-end and backend (which is a link to a PHP class), the MySQL data type (int, varchar, static, text, datetime, decimal), and flags representing whether the attribute is required, must be unique, can be deleted, and its default value. But still you’ll notice that the data itself is nowhere to be found.
The actual data is in the value tables. You’ll notice that’s tables plural because different table definitions are required depending on the type of data stored. Indeed, Magento’s value tables for product attributes, for instance, are named “catalog_product_entity_datetime“, “catalog_product_entity_decimal“, “catalog_product_entity_int“, etc.
These tables all have primary keys and IDs that link them to the attribute they’re storing data for and the entity that “owns” that value, but have only one “actual” data column, simply called “value”. This (finally) is where all of the data is stored: in the “value” column of one of these value tables.
Okay, so, why should I bother?
EAV is undoubtedly a far more complex way of managing data than a normal, “flat” database structure. But it has two huge advantages: flexibility and metadata. In the books/movies/T-shirts example we discussed initially, I don’t need one giant table to cover all possible attributes, nor do I need very different table definitions for each product type.
Every book, movie, or shirt is a product entity stored in an entity table with an enum that identifies what kind of product it is. Every attribute is stored as a row in an attribute table (author, publisher, color, size, director, year released, etc) that identifies what type of data it is (author: varchar, year released: datetime), what type of entity it applies to, and how the data is input (size: select, year: calendar) and validated. Here we also have the opportunity to store a vast amount of metadata about each attribute, for instance indicating that certain attributes (quantity, actual cost) should not be shown on the frontend, that some (price) are required, and more.
Magento has a whole table, “catalog_eav_attribute“, that is dedicated to metadata for catalog entity attributes. Things like whether the customer can filter results by the attribute, whether promotional rules can depend on the attribute, and much more. With a flat database structure, this would take another set of columns for every attribute you wanted to apply metadata to, quickly leading to a behemoth of a table and a data manipulation nightmare. With EAV, values are only stored for fields that actually have a value — there is no need to list a T-shirt’s author, title, date, and pagecount as NULL.
However, while EAV is a fantastically flexible and powerful, it’s also fantastically complex. Implementing a EAV structure for a blogging site is like outfitting a pocket calculator with a Core i7 processor: it’s overkill. Querying for data with an EAV model can be slow and very difficult to understand since data needs to be joined and selected from a half-dozen (or more) tables just to retrieve one value.
Luckily, caching techniques and temporary flat tables can help mitigate the performance penalty, but straying outside of the Magento ORM (“Object Relational Mapping“) can quickly lead to improperly stored data and a lot of head scratching. But when used correctly, and when a product manager can list the searchable, filterable, and comparable viscosity of the different varieties of wonder putty he or she sells online without needing to build an entirely new model around it, it’s really quite a powerful tool.