| Submitter | Austin Clements |
|---|---|
| Date | 2011-11-06 17:17:36 |
| Message ID | <1320599856-24078-1-git-send-email-amdragon@mit.edu> |
| Download | mbox | patch |
| Permalink | /patch/1465/ |
| State | New |
| Headers | show |
Comments
On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > This is a rebase and cleanup of Istvan Marko's patch from > id:m3pqnj2j7a.fsf@zsu.kismala.com > > Search retrieves these headers for every message in the search > results. Previously, this required opening and parsing every message > file. Storing them directly in the database significantly reduces IO > and computation, speeding up search by between 50% and 10X. Hi, sounds good, but... > Taking full advantage of this requires a database rebuild, but it will > fall back to the old behavior for messages that do not have headers > stored in the database. ...what's the most convenient way of rebuilding the database while preserving my tags etc.? If this was merged, would an older version of notmuch choke on the rebuilt database with these headers? (To me it looks like it would be fine.) BR, Jani. > --- > lib/database.cc | 2 +- > lib/message.cc | 23 +++++++++++++++++++++-- > lib/notmuch-private.h | 11 +++++++---- > 3 files changed, 29 insertions(+), 7 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index fa632f8..e4ef14e 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -1725,7 +1725,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, > goto DONE; > > date = notmuch_message_file_get_header (message_file, "date"); > - _notmuch_message_set_date (message, date); > + _notmuch_message_set_header_values (message, date, from, subject); > > _notmuch_message_index_file (message, filename); > } else { > diff --git a/lib/message.cc b/lib/message.cc > index 8f22e02..ca7fbf2 100644 > --- a/lib/message.cc > +++ b/lib/message.cc > @@ -412,6 +412,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message) > const char * > notmuch_message_get_header (notmuch_message_t *message, const char *header) > { > + std::string value; > + > + /* Fetch header from the appropriate xapian value field if > + * available */ > + if (strcasecmp (header, "from") == 0) > + value = message->doc.get_value (NOTMUCH_VALUE_FROM); > + else if (strcasecmp (header, "subject") == 0) > + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT); > + else if (strcasecmp (header, "message-id") == 0) > + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID); > + > + if (!value.empty()) > + return talloc_strdup (message, value.c_str ()); > + > + /* Otherwise fall back to parsing the file */ > _notmuch_message_ensure_message_file (message); > if (message->message_file == NULL) > return NULL; > @@ -795,8 +810,10 @@ notmuch_message_set_author (notmuch_message_t *message, > } > > void > -_notmuch_message_set_date (notmuch_message_t *message, > - const char *date) > +_notmuch_message_set_header_values (notmuch_message_t *message, > + const char *date, > + const char *from, > + const char *subject) > { > time_t time_value; > > @@ -809,6 +826,8 @@ _notmuch_message_set_date (notmuch_message_t *message, > > message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP, > Xapian::sortable_serialise (time_value)); > + message->doc.add_value (NOTMUCH_VALUE_FROM, from); > + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); > } > > /* Synchronize changes made to message->doc out into the database. */ > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h > index 0d3cc27..60a932f 100644 > --- a/lib/notmuch-private.h > +++ b/lib/notmuch-private.h > @@ -93,7 +93,9 @@ NOTMUCH_BEGIN_DECLS > > typedef enum { > NOTMUCH_VALUE_TIMESTAMP = 0, > - NOTMUCH_VALUE_MESSAGE_ID > + NOTMUCH_VALUE_MESSAGE_ID, > + NOTMUCH_VALUE_FROM, > + NOTMUCH_VALUE_SUBJECT > } notmuch_value_t; > > /* Xapian (with flint backend) complains if we provide a term longer > @@ -269,9 +271,10 @@ void > _notmuch_message_ensure_thread_id (notmuch_message_t *message); > > void > -_notmuch_message_set_date (notmuch_message_t *message, > - const char *date); > - > +_notmuch_message_set_header_values (notmuch_message_t *message, > + const char *date, > + const char *from, > + const char *subject); > void > _notmuch_message_sync (notmuch_message_t *message); > > -- > 1.7.2.3 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch
On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > Search retrieves these headers for every message in the search > results. Previously, this required opening and parsing every message > file. Storing them directly in the database significantly reduces IO > and computation, speeding up search by between 50% and 10X. Just tried the patch and I can confirm that, after rebuilding the database, it makes searches a lot faster. Cheers, Daniel
On Sun, 06 Nov 2011 23:07:51 +0200, Jani Nikula <jani@nikula.org> wrote: > ...what's the most convenient way of rebuilding the database while > preserving my tags etc.? If this was merged, would an older version of > notmuch choke on the rebuilt database with these headers? (To me it > looks like it would be fine.) Here's what I did: notmuch dump > tags.db rm -rf ~/Maildir/.notmuch notmuch new notmuch restore < tags.db Cheers, Daniel
On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > This is a rebase and cleanup of Istvan Marko's patch from > id:m3pqnj2j7a.fsf@zsu.kismala.com > Fantastic performance improvement Austin! This should be merged in ASAP. BTW, compacting the db from time to time also has a significant impact: Running: $ du -h .notmuch $ sync && sudo /sbin/sysctl vm.drop_caches=3 $ time notmuch search "*" | wc -l On: 1 - original database, compacted some time ago 2 - fresh database generated before patching, non-compacted 3 - fresh database generated after patching, non-compacted 4 - fresh database generated after patching, compacted with $ mv .notmuch/xapian .notmuch/xapian-fat $ xapian-compact --no-renumber .notmuch/xapian-fat .notmuch/xapian Results: | db | 1 | 2 | 3 | 4 | |---------+-----------+----------+-----------+-----------| | db size | 272M | 289M | 291M | 172M | | amount | 9536 | 9540 | 9540 | 9540 | |---------+-----------+----------+-----------+-----------| | real | 1m42.221s | 2m3.193s | 0m30.762s | 0m10.505s | | user | 0m8.379s | 0m8.133s | 0m4.043s | 0m3.353s | | sys | 0m5.216s | 0m4.933s | 0m1.530s | 0m1.000s | > Search retrieves these headers for every message in the search > results. Previously, this required opening and parsing every message > file. Storing them directly in the database significantly reduces IO > and computation, speeding up search by between 50% and 10X. > > Taking full advantage of this requires a database rebuild, but it will > fall back to the old behavior for messages that do not have headers > stored in the database. > --- > lib/database.cc | 2 +- > lib/message.cc | 23 +++++++++++++++++++++-- > lib/notmuch-private.h | 11 +++++++---- > 3 files changed, 29 insertions(+), 7 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index fa632f8..e4ef14e 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -1725,7 +1725,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, > goto DONE; > > date = notmuch_message_file_get_header (message_file, "date"); > - _notmuch_message_set_date (message, date); > + _notmuch_message_set_header_values (message, date, from, subject); > > _notmuch_message_index_file (message, filename); > } else { > diff --git a/lib/message.cc b/lib/message.cc > index 8f22e02..ca7fbf2 100644 > --- a/lib/message.cc > +++ b/lib/message.cc > @@ -412,6 +412,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message) > const char * > notmuch_message_get_header (notmuch_message_t *message, const char *header) > { > + std::string value; > + > + /* Fetch header from the appropriate xapian value field if > + * available */ > + if (strcasecmp (header, "from") == 0) > + value = message->doc.get_value (NOTMUCH_VALUE_FROM); > + else if (strcasecmp (header, "subject") == 0) > + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT); > + else if (strcasecmp (header, "message-id") == 0) > + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID); > + > + if (!value.empty()) > + return talloc_strdup (message, value.c_str ()); > + > + /* Otherwise fall back to parsing the file */ > _notmuch_message_ensure_message_file (message); > if (message->message_file == NULL) > return NULL; > @@ -795,8 +810,10 @@ notmuch_message_set_author (notmuch_message_t *message, > } > > void > -_notmuch_message_set_date (notmuch_message_t *message, > - const char *date) > +_notmuch_message_set_header_values (notmuch_message_t *message, > + const char *date, > + const char *from, > + const char *subject) > { > time_t time_value; > > @@ -809,6 +826,8 @@ _notmuch_message_set_date (notmuch_message_t *message, > > message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP, > Xapian::sortable_serialise (time_value)); > + message->doc.add_value (NOTMUCH_VALUE_FROM, from); > + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); > } > > /* Synchronize changes made to message->doc out into the database. */ > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h > index 0d3cc27..60a932f 100644 > --- a/lib/notmuch-private.h > +++ b/lib/notmuch-private.h > @@ -93,7 +93,9 @@ NOTMUCH_BEGIN_DECLS > > typedef enum { > NOTMUCH_VALUE_TIMESTAMP = 0, > - NOTMUCH_VALUE_MESSAGE_ID > + NOTMUCH_VALUE_MESSAGE_ID, > + NOTMUCH_VALUE_FROM, > + NOTMUCH_VALUE_SUBJECT > } notmuch_value_t; > > /* Xapian (with flint backend) complains if we provide a term longer > @@ -269,9 +271,10 @@ void > _notmuch_message_ensure_thread_id (notmuch_message_t *message); > > void > -_notmuch_message_set_date (notmuch_message_t *message, > - const char *date); > - > +_notmuch_message_set_header_values (notmuch_message_t *message, > + const char *date, > + const char *from, > + const char *subject); > void > _notmuch_message_sync (notmuch_message_t *message); > > -- > 1.7.2.3 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch Peace
On Fri, 11 Nov 2011 02:33:38 +0100, Pieter Praet <pieter@praet.org> wrote: > On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > > This is a rebase and cleanup of Istvan Marko's patch from > > id:m3pqnj2j7a.fsf@zsu.kismala.com > > > > Fantastic performance improvement Austin! [...] ... and Istvan Marko, of course! Thanks! Peace
Quoth Pieter Praet on Nov 11 at 2:38 am: > On Fri, 11 Nov 2011 02:33:38 +0100, Pieter Praet <pieter@praet.org> wrote: > > On Sun, 6 Nov 2011 12:17:36 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > > > This is a rebase and cleanup of Istvan Marko's patch from > > > id:m3pqnj2j7a.fsf@zsu.kismala.com > > > > > > > Fantastic performance improvement Austin! [...] > > ... and Istvan Marko, of course! Thanks! Yes. This is really Istvan's patch. I just dug it out of the archives and cleaned up some whitespace.
Patch
diff --git a/lib/database.cc b/lib/database.cc index fa632f8..e4ef14e 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -1725,7 +1725,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, goto DONE; date = notmuch_message_file_get_header (message_file, "date"); - _notmuch_message_set_date (message, date); + _notmuch_message_set_header_values (message, date, from, subject); _notmuch_message_index_file (message, filename); } else { diff --git a/lib/message.cc b/lib/message.cc index 8f22e02..ca7fbf2 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -412,6 +412,21 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message) const char * notmuch_message_get_header (notmuch_message_t *message, const char *header) { + std::string value; + + /* Fetch header from the appropriate xapian value field if + * available */ + if (strcasecmp (header, "from") == 0) + value = message->doc.get_value (NOTMUCH_VALUE_FROM); + else if (strcasecmp (header, "subject") == 0) + value = message->doc.get_value (NOTMUCH_VALUE_SUBJECT); + else if (strcasecmp (header, "message-id") == 0) + value = message->doc.get_value (NOTMUCH_VALUE_MESSAGE_ID); + + if (!value.empty()) + return talloc_strdup (message, value.c_str ()); + + /* Otherwise fall back to parsing the file */ _notmuch_message_ensure_message_file (message); if (message->message_file == NULL) return NULL; @@ -795,8 +810,10 @@ notmuch_message_set_author (notmuch_message_t *message, } void -_notmuch_message_set_date (notmuch_message_t *message, - const char *date) +_notmuch_message_set_header_values (notmuch_message_t *message, + const char *date, + const char *from, + const char *subject) { time_t time_value; @@ -809,6 +826,8 @@ _notmuch_message_set_date (notmuch_message_t *message, message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP, Xapian::sortable_serialise (time_value)); + message->doc.add_value (NOTMUCH_VALUE_FROM, from); + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); } /* Synchronize changes made to message->doc out into the database. */ diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h index 0d3cc27..60a932f 100644 --- a/lib/notmuch-private.h +++ b/lib/notmuch-private.h @@ -93,7 +93,9 @@ NOTMUCH_BEGIN_DECLS typedef enum { NOTMUCH_VALUE_TIMESTAMP = 0, - NOTMUCH_VALUE_MESSAGE_ID + NOTMUCH_VALUE_MESSAGE_ID, + NOTMUCH_VALUE_FROM, + NOTMUCH_VALUE_SUBJECT } notmuch_value_t; /* Xapian (with flint backend) complains if we provide a term longer @@ -269,9 +271,10 @@ void _notmuch_message_ensure_thread_id (notmuch_message_t *message); void -_notmuch_message_set_date (notmuch_message_t *message, - const char *date); - +_notmuch_message_set_header_values (notmuch_message_t *message, + const char *date, + const char *from, + const char *subject); void _notmuch_message_sync (notmuch_message_t *message);