tristanz / ScytheMCMC (http://bytebucket.org/tristanz/scythemcmc/wiki/html/index.html)

A Scythe Markov Chain Monte Carlo C++ Framework

Clone this repository (size: 1.0 MB): HTTPS / SSH
$ hg clone http://bitbucket.org/tristanz/scythemcmc
commit 13: 9844e0828091
parent 12: fd1ece924824
branch: default
tags: tip
Updates. Switched to Lecuyer and enable seed saving.
Tristan Zajonc / tristanz
14 months ago
ScytheMCMC / ScytheBugs.txt
r13:9844e0828091 81 loc 3.3 KB embed / history / annotate / raw /
Scythe Bugs and Missing Features.

Bugs:
- mean, sd, var, matrix methods etc are templated to be the same type as the
  matrix.  However these functions should return double, never int.

- Some matrix functions give compile errors because for int.  For instance
  sd() on a int matrix type fails because std:pow(int, int) is not valid.

- Something is messed up with rexp(). It's missing in the help file but:
  http://scythe.wustl.edu/api/classscythe_1_1rng_55184ef69537f8a47f82c1df4867d5dc.html#55184ef69537f8a47f82c1df4867d5dc
  Compiling the help using the latest doxygen seems to correct the problem, 
  although macros don't get documented really.

- Docs for scythe::min refer to max.

- What is linesearch? Docs are not very clear.

- Reading in from file is fragile.  outsheet in stata on OSX does not
  use the proper newline characters.  newlines across dos/unix/osx differ.

- col_interchange doesn't sound like it is what it is in help file.  It swaps
	rows rather than reordering.

- qnorm1 has return type "double double sd double" in docs.

Missing / Potential Features:

- rtnorm is parameterized using the variance whereas rnorm is parameterized
  with sd.  Possibility of errors.

- Fast versions of functions proportional to log density.  In most MCMC
  applications sum(log(scythe::dnorm)), for instance, takes up a significant
  amount of time.  However we usually just need the log of something
  proportional to density (for Slice or MH).  For instance, 
  dnorm(x, mean, sd, fastlog=true) = -(x-mean)^2/(2 * sd^2), which avoids
  and exp, sqrt, log, and a few other operations.  I get a 10
  fold increase in speed (of my entire MCMC algorithm) by replacing a
  sum(log(dnorm)) with 

	// faster sum log dnorm.
	template <typename T, matrix_order PO, matrix_style PS>
	double sum_fast_log_dnorm(const Matrix<T,PO,PS> &A, double mean, double sd)
	{
	  double result = 0;
	  for(int i = 0; i < A.size(); ++i) {
	    result += fast_log_dnorm(A[i], mean, sd);
	  }
	  return result;
	}

	double fast_log_dnorm(double x, double mean, double sd) {
	  return -pow((x - mean),2)/(2*pow(sd,2));
	}

  I haven't figured it out, but this should also be vectorizable using SSE
  and multiple cores.  Could lead to a lead to a 40 fold total increase in
  speed.

- qnorm1 is the only inverse cdf available.  It would be nice to have
  all the q functions available.  These should be in R, no?

- Missing some common random types, like multinomial.

- Upgrade MT to use SSE instructions. Honestly, however, random number
  generation speed does not seem to matter at all.  It's all the density
  calls, exp, log, pow, sqrt, which is where optimization should focus.

- How hard would it be to add SSE instructions to log and exp, at the very 
  least?  Might get 5 fold increase in speed, without MKL.
  See: http://gruntthepeon.free.fr/ssemath/

- The easiest path towards performance would be to fix things allow
  the Intel MKL to do optimizations.  I got 2-3x performance increase using
  Intel C++ compiler over g++4.01 without changing anything.
  Using getArray(), it would
  be easy to wrap MKL functions into Scythe with a day or two of work.  
  I suspect gains would be large, since it takes advantage of SSE, multi cores
  etc. Then add a -DHAS_MKL flag. See:
	http://www.intel.com/software/products/mkl/docs/WebHelp/mkl.htm