Datastore Bulk Upload + ReferenceProperty in Google App Engine
I have been working on my first Google App Engine app (a vintage camera database), and I have just got to the point when I need to upload some real data to the Datastore. Essentially, I have an CSV containing all the camera information, including the manufacturer's information and I want to use google.appengine.tools.bulkloader.Loader to upload all this information to Datastore. I have two separate entities representing "camera" and "manufacturer", and I would like to normalize the data as such.
Did someone shout, "Normalization + Bigtable = Bad"? Well, I have yet to make up my mind on this, and for now, I have decided to go with a more traditional relational database model. This may change in the future, but I decided I should still share my experience with uploading relational data as I feel that many people are probably trying to do the same thing.
Before I carry on, these are the two (simplified) classes:
and the CSV:
Effectively, I want four instances of camera and two instances of manufacturer created, with the cameras correctly referencing the manufacturers.
Google App Engine comes with the google.appengine.tools.bulkloader.Loader class which facilitates the process of bulk uploading data to both the development server (local) datastore and the real (cloud) datastore. It supports basic mapping of CSV fields to object attributes. However, dealing with ReferenceProperty turns out to be not so straightforward. A quick Google search led to a few articles suggesting overridding the HandleEntity (or handle_entity?) function when subclassing google.appengine.tools.bulkloader.Loader. Unfortunately, for some reason, I did not manage to get that to work. After some experimentation, I stumbled across my own solution, which actually is quite simple :-)
DISCLAIMER: I HAVE ONLY TRIED UPLOADING DATA TO MY LOCAL DEVELOPMENT SERVER (GOOGLE APP ENGINE 1.2.3) USING THIS METHOD, AND NOT TO THE REAL CLOUD YET...
Instead of injecting the reference in the HandleEntity function, I create a function called get_manufacturer which looks in the datastore for an existing manufacturer based on the manufacturer's name, and returns its key (or the instance itself) if found, or creates a new instance in the datastore before returning the newly assigned key otherwise. I then make that the transformation function for the manufacturer ReferenceProperty.
The following is my loader class with the magic function:
Please refer to http://code.google.com/appengine/docs/python/tools/uploadingdata.html for details on how to do the actual bulk upload.
Did someone shout, "Normalization + Bigtable = Bad"? Well, I have yet to make up my mind on this, and for now, I have decided to go with a more traditional relational database model. This may change in the future, but I decided I should still share my experience with uploading relational data as I feel that many people are probably trying to do the same thing.
Before I carry on, these are the two (simplified) classes:
class Manufacturer(db.Model):
name = db.StringProperty(required=True)
class Camera(db.Model):
name = db.StringProperty(required=True)
weight = db.IntegerProperty()
introduction_year = db.IntegerProperty()
max_aperture_value = db.FloatProperty()
e_mode_p = db.BooleanProperty(default=False)
e_mode_sp = db.BooleanProperty(default=False)
e_mode_ap = db.BooleanProperty(default=False)
e_mode_m = db.BooleanProperty(default=False)
note = db.TextProperty()
manufacturer = db.ReferenceProperty(Manufacturer)
and the CSV:
"Canonet G-III QL17",Canon,620,1972,1.7,FALSE,TRUE,FALSE,TRUE,
"Canonet G-III QL19",Canon,620,1972,1.9,FALSE,TRUE,FALSE,TRUE,
Demi,Canon,380,1963,2.8,FALSE,TRUE,FALSE,TRUE,"Demi is a half-frame camera"
"TRIP 35",Olympus,390,1967,2.8,TRUE,FALSE,FALSE,FALSE,
Effectively, I want four instances of camera and two instances of manufacturer created, with the cameras correctly referencing the manufacturers.
Google App Engine comes with the google.appengine.tools.bulkloader.Loader class which facilitates the process of bulk uploading data to both the development server (local) datastore and the real (cloud) datastore. It supports basic mapping of CSV fields to object attributes. However, dealing with ReferenceProperty turns out to be not so straightforward. A quick Google search led to a few articles suggesting overridding the HandleEntity (or handle_entity?) function when subclassing google.appengine.tools.bulkloader.Loader. Unfortunately, for some reason, I did not manage to get that to work. After some experimentation, I stumbled across my own solution, which actually is quite simple :-)
DISCLAIMER: I HAVE ONLY TRIED UPLOADING DATA TO MY LOCAL DEVELOPMENT SERVER (GOOGLE APP ENGINE 1.2.3) USING THIS METHOD, AND NOT TO THE REAL CLOUD YET...
Instead of injecting the reference in the HandleEntity function, I create a function called get_manufacturer which looks in the datastore for an existing manufacturer based on the manufacturer's name, and returns its key (or the instance itself) if found, or creates a new instance in the datastore before returning the newly assigned key otherwise. I then make that the transformation function for the manufacturer ReferenceProperty.
The following is my loader class with the magic function:
from google.appengine.ext import db
from google.appengine.tools import bulkloader
import model
def get_manufacturer(name):
manufacturers = db.GqlQuery("select * from Manufacturer where name = :1", name)
if manufacturers.count() == 0:
newManufacturer = model.Manufacturer(name=name)
db.put(newManufacturer)
return newManufacturer
else:
return manufacturers[0]
class CameraLoader(bulkloader.Loader):
def __init__(self):
bulkloader.Loader.__init__(self, "Camera",
[("name", str),
("manufacturer", get_manufacturer),
("weight", int),
("introduction_year", int),
("max_aperture_value", float),
("e_mode_p", bool),
("e_mode_sp", bool),
("e_mode_ap", bool),
("e_mode_m", bool),
("note", str)])
loaders = [CameraLoader]
if __name__ == '__main__':
bulkload.main(CameraLoader)
Please refer to http://code.google.com/appengine/docs/python/tools/uploadingdata.html for details on how to do the actual bulk upload.
Comments