Scythe Bugs and Missing Features.
Bugs:
- mean, sd, var, matrix methods etc are templated to be the same type as the
matrix. However these functions should return double, never int.
- Some matrix functions give compile errors because for int. For instance
sd() on a int matrix type fails because std:pow(int, int) is not valid.
- Something is messed up with rexp(). It's missing in the help file but:
http://scythe.wustl.edu/api/classscythe_1_1rng_55184ef69537f8a47f82c1df4867d5dc.html#55184ef69537f8a47f82c1df4867d5dc
Compiling the help using the latest doxygen seems to correct the problem,
although macros don't get documented really.
- Docs for scythe::min refer to max.
- What is linesearch? Docs are not very clear.
- Reading in from file is fragile. outsheet in stata on OSX does not
use the proper newline characters. newlines across dos/unix/osx differ.
- col_interchange doesn't sound like it is what it is in help file. It swaps
rows rather than reordering.
- qnorm1 has return type "double double sd double" in docs.
Missing / Potential Features:
- rtnorm is parameterized using the variance whereas rnorm is parameterized
with sd. Possibility of errors.
- Fast versions of functions proportional to log density. In most MCMC
applications sum(log(scythe::dnorm)), for instance, takes up a significant
amount of time. However we usually just need the log of something
proportional to density (for Slice or MH). For instance,
dnorm(x, mean, sd, fastlog=true) = -(x-mean)^2/(2 * sd^2), which avoids
and exp, sqrt, log, and a few other operations. I get a 10
fold increase in speed (of my entire MCMC algorithm) by replacing a
sum(log(dnorm)) with
// faster sum log dnorm.
template <typename T, matrix_order PO, matrix_style PS>
double sum_fast_log_dnorm(const Matrix<T,PO,PS> &A, double mean, double sd)
{
double result = 0;
for(int i = 0; i < A.size(); ++i) {
result += fast_log_dnorm(A[i], mean, sd);
}
return result;
}
double fast_log_dnorm(double x, double mean, double sd) {
return -pow((x - mean),2)/(2*pow(sd,2));
}
I haven't figured it out, but this should also be vectorizable using SSE
and multiple cores. Could lead to a lead to a 40 fold total increase in
speed.
- qnorm1 is the only inverse cdf available. It would be nice to have
all the q functions available. These should be in R, no?
- Missing some common random types, like multinomial.
- Upgrade MT to use SSE instructions. Honestly, however, random number
generation speed does not seem to matter at all. It's all the density
calls, exp, log, pow, sqrt, which is where optimization should focus.
- How hard would it be to add SSE instructions to log and exp, at the very
least? Might get 5 fold increase in speed, without MKL.
See: http://gruntthepeon.free.fr/ssemath/
- The easiest path towards performance would be to fix things allow
the Intel MKL to do optimizations. I got 2-3x performance increase using
Intel C++ compiler over g++4.01 without changing anything.
Using getArray(), it would
be easy to wrap MKL functions into Scythe with a day or two of work.
I suspect gains would be large, since it takes advantage of SSE, multi cores
etc. Then add a -DHAS_MKL flag. See:
http://www.intel.com/software/products/mkl/docs/WebHelp/mkl.htm