>>> from mongokit import *
>>> import datetime
>>> class BlogPost(MongoDocument):
... db_name = 'test'
... collection_name = 'tutorial'
... structure = {
... 'title':unicode,
... 'body':unicode,
... 'author':unicode,
... 'date_creation':datetime.datetime,
... 'rank':int
... }
... required_fields = ['title','author', 'date_creation']
... default_values = {'rank':0, 'date_creation':datetime.datetime.utcnow}
...
A MongoDocument take a db_name and a collection_name as attribute. Next, you have to specify a structure. The structure is a simply dictionnary with python type. In this example, title must be unicode and rank must be an int.
Optionaly, you can add some descriptors. In order to specify fields which are required, just add a required_fields attribute. This is a simple list which list all required_fields (ie, those field must not be None when validating).
Same thing with the default_values attribute. This is a dict where you can specify default values. Note that you can pass callable object (like a datetime).
But before, some cleanup:
>>> Connection().test.tutorial.remove({})
Now, let’s create a blogpost:
>>> bp = BlogPost()
>>> bp # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'body': None, 'title': None, 'date_creation': datetime.datetime(...), 'rank': 0, 'author': None}
Not that date_creation was automaticly filled by utcnow() and rank is 0.
>>> bp['title'] = "my first blog post"
>>> bp.validate()
...
SchemaTypeError: title must be an instance of unicode not str
str type is not authorized, you must use unicode :
>>> bp['title'] = u"my first blog post"
validate method will check if required fields are set :
>>> bp.validate()
...
RequireFieldError: author is required
>>> bp['author'] = u'myself'
>>> bp.validate()
Note that save will call the validate method, so you don’t have to validate each time.
>>> bp.save() # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'body': None, 'title': u'my first blog post', 'author': u'myself', 'rank': 0, '_id': ..., 'date_creation': datetime.datetime(...)}
There 4 way to query a collection all(), one(), fetch(), fetch_one. all() and fetch() return a cursor of collection. A cursor is an container which lazy evaluate the results. A cursor is acting like an iterator. one() and fetch_one() return the document itself.
All theses method can take a query as argument. A query is a simple dict. Please, see the mongodb and the pymongo documentation for further details.
new in developpment version
Note that when you make a query, the collection is passed automatically to the result objects. You can then specify a collection dynamically when making queries.
all() without argument will return a cursor of all documents of the collection. If a query is passed, it will return a cursor all documents which match the query. The query is launch against the db and collection of the object.
all() takes the same arguments than the the pymongo.collection.find method.
>>> bp = BlogPost()
>>> bp['title'] = u"my second blog post"
>>> bp['author'] = u"myself"
>>> bp.save() # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{'body': None, 'title': u'my second blog post', 'author': u'myself', 'rank': 0, '_id': ..., 'date_creation': datetime.datetime(...)}
>>> for post in BlogPost.all():
... print post['title']
my first blog post
my second blog post
>>> for post in BlogPost.all({'title':'my first blog post'}):
... print post['title']
my first blog post
one() act like all() but will raise a mongokit.MultipleResultsFound exception if there is more than one result.
>>> BlogPost.one()
...
MultipleResultsFound: 2 results found
>>> BlogPost.one({'title':'my first blog post'}) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
{u'body': None, u'title': u'my first blog post', u'author': u'myself', u'rank': 0, u'_id': ..., u'date_creation': datetime.datetime(...)}
If no document is found, one() returns None
Unlike all(), fetch() will return only documents which match the structure of the Document.
>>> all_blog_posts = BlogPost.fetch()
This will return only all blog post (which have ‘title’, ‘body’, ‘author’, ‘date_creation’, ‘rank’ as fields). This is an helper for :
>>> all_blog_posts = BlogPost.all({'body': {'$exists': True}, 'title': {'$exists': True}, 'date_creation': {'$exists': True}, 'rank': {'$exists': True}, 'author': {'$exists': True}})
Note, like all() and and one(), you can pass advanced queries:
>>> my_blog_posts = BlogPost.fetch({'author':'myself'})
which is equivalent to:
>>> all_blog_posts = BlogPost.all({'body': {'$exists': True}, 'title': {'$exists': True}, 'date_creation': {'$exists': True}, 'rank': {'$exists': True}, 'author': 'myself'})
Juste like fetch() but raise a mongokit.MultipleResultsFound exception if there is more than one result.
Each MongoDocument must have a db_host, a db_port, a db_name and a collection_name. Those attributes are set when describing mongo documents. The connection, the db and the collection are then automatically created and attached to the object.
But you might need to specify a different db or collection dynamically. For instance, say you want to store a User by database. You can’t set the db_name and collection_name because it will change at each user.
>>> class User(MongoDocument):
... strucute = {'login':unicode, 'screen_name':unicode}
... # we can set `db_name` here...
Mongokit allow you to change those parameters on the fly. For this, you need to create another collection with the helper get_collection.
>>> user_col = User.get_collection(db_name='namlook', collection_name='profile')
new in developpment version
You can also set a connection:
con = Connection() >>> user_col = User.get_collection(connection=con, db_name=’namlook’, collection_name=’profile’)
Now, we can query the database by passing our new collection :
>>> User.all({}, collection=col)
You can pass those value to the MongoDocument.__init__ :
>>> user = User(db_name='namlook', collection_name='profile')
>>> user['login'] = 'namlook'
>>> user['screen_name'] = 'Namlook'
calling user.save() will save the objet into the database ‘namlook’ in the collection ‘profile’
new in developpment version
You can pass a connection at the document instanciation. this is usefull to share a connection beetween thread in web development for instance :
con = Connection() >>> user = User(connection = con)
new in developpment version
If you instanciate previously a collection, you can pass it to the constructor :
>>> namlook_profile_collection = User.get_collection(db_name='namlook', collection_name='profile')
>>> user = User(collection=namlook_profile_collection)
Not that you can’t specify other db parameter if you pass a collection. So the following example won’t work:
>>> user = User(db_name='john', collection=namlook_profile_collection) # DON'T DO THAT !
new in developpment version
Sometime you may want to retrieve many documents in a sort time :
>>> for doc in MyDoc.fetch():
... pass # do something with doc... slow !!
This way isn’t very efficient. Actually, this solution will wrap each result into a MongoDocument and will kill speed. Another solution consist in retrieving raw data by telling MongoKit to no wrap the result :
>>> for doc in MyDoc.fetch(wrap=False):
... pass # do something with doc... very fast !
Note that by doing this, you won’t be able to use mongokit’s method on results.
If you want to use the dot notation (ala json), you must set the use_dot_notation attribute to True:
>>> class TestDotNotation(MongoDocument):
... structure = {
... "foo":{ "bar":unicode}
... }
... use_dot_notation=True
>>> doc = TestDotNotation()
>>> doc.foo.bar = u"bla"
>>> doc
{'foo': {'bar': u'bla'}}
If you need a structured list with a limited number of field, you can use tuple to describe your object :
>>> class MyDoc(SchemaDocument):
... structure = {
... "foo":(int, unicode, float)
... }
>>> mydoc = MyDoc()
>>> mydoc['foo']
[None, None, None]
Tuple are converted into a simple list. They add another validation layer. Field must follow the right type:
>>> mydoc['foo'] = [u"bla", 1, 4.0]
>>> mydoc.validate()
...
SchemaTypeError: foo must be an instance of int not unicode
and they must have the right number of items:
>>> mydoc['foo'] = [1, u"bla"]
...
SchemaTypeError: foo must have 3 items not 2
As tuples are converted to list internally, you can make all list operations:
>>> mydoc['foo'] = [1,u'bar',3.2]
mydoc.validate()
>>> mydoc['foo'] = [None, u"bla", 3.1]
mydoc.validate()
>>> mydoc['foo'][0] = 50
mydoc.validate()
{} is used for describing the structure like {“foo”:unicode, “bar”:int}
>>> class Person(MongoDocument):
... structure = {
... "biography": {"foo":unicode, "bar":int}
... }
If you don’t specify the structure :
>>> class Person(MongoDocument):
... structure = {
... "biography": {}
... }
You won’t be able to do that because “foo” is not defined into the structure.
>>> bob = Person()
>>> bob[u"biography"][u"foo"] = u"bla"
>>> bob.validate()
...
StructureError: unknown fields : [u'foo']
If you want to add new items to a dict if they’re not defined, you must use the dict type instead :
>>> class Person(MongoDocument):
... structure = {
... "biography": dict
... }
>>> bob = Person()
>>> bob[u"biography"][u"foo"] = u"bla"
>>> bob.validate()
Using dict type is useful if you don’t know what field will be added and what will be the type of the field. If you know the type of the field, it’s better to do that :
>>> class Person(MongoDocument):
... structure = {
... "biography": {unicode:unicode}
... }
This will add another layer to validate the content. See “validate keys” section for more informations.
By default, MongoKit allow the following types:
authorized_types = [type(None),
bool,
int,
float,
long,
unicode,
list,
dict,
datetime.datetime,
pymongo.binary.Binary,
pymongo.objectid.ObjectId,
pymongo.dbref.DBRef,
pymongo.code.Code,
type(re.compile("")),
CustomType,
]
It’s possible to add more type in authorized_types:
>>> class MyDoc(MongoDocument):
... structure = {
... "foo":str,
... }
... authorized_types = MongoDocument.authorized_types + [str]
>>> mydoc = MyDoc()
>>> mydoc['foo'] = 'bla'
>>> mydoc.validate()
If the value of key is not known but we want to validate some deeper structure, we use the “$<type>” descriptor :
>>> class MyDoc(MongoDocument):
... structure = {
... "key1":{
... unicode:{
... "bla":int,
... "bar":{unicode:int}
... },
... },
... "bla":float,
... }
... required_fields = ["key1.$unicode.bla"]
...
Not that if you use python type as key in structure, generate_skeleton won’t be able to build the entired underline structure :
>>> MyDoc() == {'key1': {}, 'bla': None}
True
So, default_values nor signals will work.
It is possible to add another layer of validation to fields.
Let’s say that we have a field which can be unicode or int or a float. We can use the OR operator to tell MongoKit to validate the field :
>>> from mongokit import OR
>>> from datetime import datetime
>>> class Account(MongoDocument):
... structure = {
... "balance": {'foo': OR(unicode, int, float)}
... }
>>> account = Account()
>>> account['balance']['foo'] = u'3.0'
>>> account.validate()
>>> account['balance']['foo'] = 3.0
>>> account.validate()
but :
>>> account['balance']['foo'] = datetime.now()
>>> account.validate()
...
SchemaTypeError: balance.foo must be an instance of <unicode or int or float> not datetime
You can also use the NOT operator to tell MongoKit that you don’t want a such type for a field :
>>> from mongokit import NOT
>>> class Account(MongoDocument):
... structure = {
... "balance": {'foo': NOT(unicode, datetime)}
... }
>>> account = Account()
>>> account['balance']['foo'] = 3
>>> account.validate()
>>> account['balance']['foo'] = 3.0
>>> account.validate()
and :
>>> account['balance']['foo'] = datetime.now()
>>> account.validate()
...
SchemaTypeError: balance.foo must be an instance of <not unicode, not datetime> not datetime
>>> account['balance']['foo'] = u'3.0'
>>> account.validate()
...
SchemaTypeError: balance.foo must be an instance of <not unicode, not datetime> not unicode
Sometime, you might want to force a fields to be in a specifique value. The IS operator must be use for this purpose :
>>> from mongokit import IS
>>> class Account(MongoDocument):
... structure = {
... "flag": {'foo': IS(u'spam', u'controversy', u'phishing')}
... }
>>> account = Account()
>>> account['flag']['foo'] = u'spam'
>>> account.validate()
>>> account['flag']['foo'] = u'phishing'
>>> account.validate()
and :
>>> account['flag']['foo'] = u'foo'
>>> account.validate()
...
SchemaTypeError: flag.foo must be in [u'spam', u'controversy', u'phishing'] not foo
If the use of a validator is not enougth, you can overload the validation method to feet your needs.
Example the following document:
>>> class MyDoc(MongoDocument):
... structure={
... "foo":int,
... "bar":int,
... "baz":unicode,
... }
...
We want to be sure that before saving our object, foo is greater than bar and baz is unicode(foo). Do do that, we juste overload the validation method :
- def validate(self):
- assert self[‘foo’] > self[‘bar’] assert self[‘baz’] == unicode(self[‘foo’]) super(MyDoc, self).validate(self)
Once your application is ready for production and you are sure that the data is consistant, you might want to skip the validation layer. This will make mongokit significant faster (as fast as pymongo). In order to do that, just set the skip_validation attribute to True.
TIP: this is a good idea to create a “RootDocument” and to inherite all you object from it. This will allow you to control the behavior of all your objects by setting the RootDocument:
>>> class RootDocument(MongoDocument):
... structure = {}
... skip_validation = True
... use_dot_notation = True
... use_autorefs = True
>>> class MyDoc(RootDocument):
... structure={
... "foo":int,
... }
Note that you can always force the validation at any moment on saving even if skip_validation is True:
>>> mydoc = MyDoc()
>>> mydoc['foo'] = u'bar'
>>> mydoc.save(validate=True)
...
SchemaTypeError: foo must be an instance of int not unicode
Sometime, we need to work with complexe object while their footprint in the database is fairly simple. Let’s take a datetime object. Datetime object can be usefull to compute complexe date but while mongodb can deal with datetime object, let’s say that we just want to store the unicode representation.
MongoKit allow the use to work on a datetime object and store the unicode representation on the fly. In order to do this, we have to implement a CustomType and fill the custom_types attributes:
>>> import datetime
A CustomType object must implement two methods and one attribute:
- to_bson(self, value): this method will convert the value
to fit the correct authorized type before beeing saved in the db.
- to_python(self, value): this method will convert the value
taken from the db into a python object
You must specify a mongo_type property in the CustomType class. this will describes the type of the value stored in the mongodb.
If you want more validation, you can specify a python_type property which is the python type the value will be converted. This is a good thing to specify it as it make a good documentation.
>>> class CustomDate(CustomType):
... mongo_type = unicode
... python_type = datetime.datetime # optional, just for more validation
... def to_bson(self, value):
... """convert type to a mongodb type"""
... return unicode(datetime.datetime.strftime(value,'%y-%m-%d'))
... def to_python(self, value):
... """convert type to a python object"""
... if value is not None:
... return datetime.datetime.strptime(value, '%y-%m-%d')
Now, let’s create a MongoDocument:
>>> class Foo(MongoDocument):
... db_name = 'test'
... collection_name = 'tutorial'
... structure = {
... 'foo':{
... 'date': CustomDate(),
... },
... }
... default_values = {'foo.date':u'08-06-07'}
Now, we can create Foo’s objects and working with python datetime objects
>>> foo = Foo()
>>> foo['_id'] = 1
>>> foo['foo']['date'] = datetime.datetime(2003,2,1)
>>> foo.save()
{'foo': {'date': datetime.datetime(2003, 2, 1, 0, 0)}, '_id': 1}
The saved object in db has the unicode footprint as expected:
>>> foo.collection.find_one({'_id':1})
{u'_id': 1, u'foo': {u'date': u'03-02-01'}}
Quering an object will convert automaticly the CustomType into the correct python object:
>>> foo = Foo.get_from_id(1)
>>> foo['foo']['date']
datetime.datetime(2003, 2, 1, 0, 0)
MongoKit has optional support for MongoDB’s autoreferencing/dbref features. Autoreferencing allows you to embed MongoKit objects/instances inside another MongoKit object. With autoreferencing enabled, MongoKit and the pymongo driver will translate the embedded MongoKit object values into internal MongoDB DBRefs. The (de)serialization is handled automatically by the pymongo driver.
Autoreferences allow you to pass other MongoDocuments as values. pymongo_. (with help from MongoKit) automatically translates these object values into DBRefs before persisting to Mongo. When fetching, it translates them back, so that you have the data values for your referenced object. See the autoref_sample. for further details/internals on this driver-level functionality. As for enabling it in your own MongoKit code, simply define the following class attribute upon your Document subclass:
use_autorefs = True
With autoref enabled, MongoKit’s connection management will attach the appropriate BSON manipulators to your document’s connection handles. We require you to explicitly enable autoref for two reasons:
- Using autoref and it’s BSON manipulators (As well as DBRefs) can carry a performance penalty. We opt for performance and simplicity first, so you must explicitly enable autoreferencing.
- You may not wish to use auto-referencing in some cases where you’re using DBRefs.
Once you have autoref enabled, MongoKit will allow you to define any valid subclass of MongoDocument as part of your document structure. If your class does not define `use_autorefs` as True, MongoKit’s structure validation code will REJECT your structure.
First let’s create a simple doc:
>>> class DocA(MongoDocument):
... db_name = "test"
... collection_name = "tutorial"
... structure = {
... "a":{'foo':int},
... "abis":{'bar':int},
... }
... default_values = {'a.foo':2}
... required_fields = ['abis.bar']
>>> doca = DocA()
>>> doca['_id'] = 'doca'
>>> doca['abis']['bar'] = 3
>>> doca.save()
{'a': {'foo': 2}, 'abis': {'bar': 3}, '_id': 'doca'}
Now, let’s create a DocB which have a reference to DocA:
>>> class DocB(MongoDocument):
... db_name = "test"
... collection_name = "tutorial"
... structure = {
... "b":{"doc_a":DocA},
... }
... use_autorefs = True
Note that to be able to specify a MongoDocument into the structure, we must set use_autorefs as True.
>>> docb = DocB()
The default value for an embeded doc is None:
>>> docb
{'b': {'doc_a': None}}
The validation act as expected:
>>> docb['b']['doc_a'] = 4
>>> docb.validate()
...
SchemaTypeError: b.doc_a must be an instance of DocA not int
>>> docb['_id'] = 'docb'
>>> docb['b']['doc_a'] = doca
>>> docb
{'b': {'doc_a': {'a': {'foo': 2}, 'abis': {'bar': 3}, '_id': 'doca'}}, '_id': 'docb'}
Now the interresting part. If we change a field in an embeded doc, the change will be done for all DocA which have the same ‘_id’:
>>> docb['b']['doc_a']['a']['foo'] = 4
>>> docb.save()
{'b': {'doc_a': {'a': {'foo': 4}, 'abis': {'bar': 3}, '_id': 'doca'}}, '_id': 'docb'}
>>> doca['a']['foo']
4
Required fields are also supported in embeded documents.i Remember DocA have the ‘abis.bar’ field required. If we set it to None via the docb document, the RequireFieldError is raised with the full path ‘b.doc_a.abis.bar’:
>>> docb['b']['doc_a']['abis']['bar'] = None
>>> docb.validate()
...
RequireFieldError: b.doc_a.abis.bar is required
Sometimes, it’s desirable to have indexes on your dataset - especially unique ones. In order to do that, you must fill the indexes attribute. The indexes attribute is a liste of dictionnary with the following structure:
| “fields”: | # take a list of fields or a field name (required) |
|---|---|
| “unique”: | should this index guarantee uniqueness? (optional, False by default) |
| “ttl”: | (optional, 300 by default) time window (in seconds) during which this index will be recognized by subsequent calls to ensure_index - see pymongo documentation for ensure_index for details. |
Example:
>>> class MyDoc(MongoDocument):
... structure = {
... 'standard':unicode,
... 'other':{
... 'deep':unicode,
... },
... 'notindexed':unicode,
... }
...
... indexes = [
... {
... 'fields':['standard', 'other.deep'],
... 'unique':True,
... },
... ]
or if you have more than one index:
>>> class Movie(MongoDocument):
... db_name = 'test'
... collection_name = 'mongokit'
... structure = {
... 'standard':unicode,
... 'other':{
... 'deep':unicode,
... },
... 'alsoindexed':unicode,
... }
...
... indexes = [
... {
... 'fields':'standard',
... 'unique':True,
... },
... {
... 'fields': ['alsoindexed', 'other.deep']
... },
... ]
By default, the index direction is set to 1. You can change the direction by passing a dictionnary. Direction must be one of INDEX_ASCENDING (or 1) or INDEX_DESCENDING (or -1):
>>> class MyDoc(MongoDocument):
... structure = {
... 'standard':unicode,
... 'other':{
... 'deep':unicode,
... },
... 'notindexed':unicode,
... }
...
... indexes = [
... {
... 'fields':{'standard':INDEX_ASCENDING, 'other.deep':INDEX_DESCENDING},
... 'unique':True,
... },
... ]
new in developpment version
Passing a dictionnary in “fields” is now deprecated. The problem when passing a dictionnary is that a dictionnary is unordered. This might be a problem for compound keys. If you want to make compound keys indexes, you now need to pass a list of tuples :
>>> class MyDoc(MongoDocument):
... structure = {
... 'standard':unicode,
... 'other':{
... 'deep':unicode,
... },
... 'notindexed':unicode,
... }
...
... indexes = [
... {
... 'fields':[('standard',INDEX_ASCENDING), ('other.deep',INDEX_DESCENDING)],
... 'unique':True,
... },
... ]
It is possible to share connection beetween object. To do this, create a RootDocument (the name doesn’t matter) and setup the connection in it :
>>> class RootDocument(MongoDocument):
... connection = Connection('localhost', '27017')
Then, make all your objects inherited from the RootDocument. This is useful if you are working in a web framework and you want to share only one connection.
While building web application, you might want to create an rest api with json support. Then, you may need to convert all your MongoDocument into a json format in order to pass it via the rest api. Unfortunately (or fortunately), MongoDB support field format which is not supported by json. This is the case for datetime but also for all your CustomTypes you may have built and your embeded objects.
MongoDocument support the json import/export. Note that you’ll need to install anyjson to enable this feature. (sudo easy_install anyjson)
| to_json: |
|---|
to_json is a simply method which export you document into a json document:
>>> class MyDoc(MongoDocument):
... db_name = "test"
... collection_name = "mongokit"
... structure = {
... "bla":{
... "foo":unicode,
... "bar":int,
... },
... "spam":[],
... }
>>> mydoc = MyDoc()
>>> mydoc['_id'] = u'mydoc'
>>> mydoc["bla"]["foo"] = u"bar"
>>> mydoc["bla"]["bar"] = 42
>>> mydoc['spam'] = range(10)
>>> mydoc.save()
>>> json = mydoc.to_json()
>>> json
'{"_id": "mydoc", "bla": {"foo": "bar", "bar": 42}, "spam": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}'
| from_json: |
|---|
To load a json string into a MongoDocument, use the from_json class method:
>>> class MyDoc(MongoDocument):
... db_name = "test"
... collection_name = "mongokit"
... structure = {
... "bla":{
... "foo":unicode,
... "bar":int,
... },
... "spam":[],
... }
>>> json = '{"_id": "mydoc", "bla": {"foo": "bar", "bar": 42}, "spam": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}'
>>> mydoc = MyDoc.from_json(json)
>>> mydoc
{'_id': 'mydoc', 'bla': {'foo': 'bar', 'bar': 42}, 'spam': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}
Not that from_json will take care of all your embeded Docs:
>>> class EmbedDoc(MongoDocument):
... db_name = "test"
... collection_name = "mongokit"
... structure = {
... "bla":{
... "foo":unicode,
... "bar":int,
... },
... "spam":[],
... }
>>> class MyDoc(MongoDocument):
... db_name = "test"
... collection_name = "mongokit"
... structure = {
... "doc":{
... "embed":EmbedDoc,
... },
... }
... use_autorefs = True
>>> json = '{"doc": {"embed": {"_id": "embed", "bla": {"foo": "bar", "bar": 42}, "spam": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}}, "_id": "mydoc"}'
>>> mydoc = MyDoc.from_json(json)
>>> mydoc
{'doc': {'embed': {u'_id': u'embed', u'bla': {u'foo': u'bar', u'bar': 42}, u'spam': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}}, '_id': u'mydoc'}
>>> isinstance(mydoc['doc']['embed'], EmbedDoc)
True