In my last post, I wrote about how Newton's method (coupled with a fast multiplication algorithm, such as Karatsuba multiplication) can be used for asymptotically fast division of long integers. I have now fixed up the code to compute a correctly rounded quotient. In fact, it now performs the full divmod operation, returning both the quotient and a correct remainder.

The trick is to perform an extra exact multiplication at the end to determine the remainder. By definition, the quotient

*r*of

*p*/

*q*is correct if the remainder

*m*=

*p*-

*r*·

*q*satisfies 0 ≤

*m*<

*q*. If this inequality does not hold, one need only perform an additional divmod operation (which can be performed using standard long division, since the error will almost certainly fit in a single limb) to correct both the quotient and the remainder.

The extra multiplication causes some slowdown, but it's not too bad. The new

`idivmod`function still breaks even with the builtin

`divmod`somewhere around 1000-2000 digits and is 10 times faster at half a million digits (i.e. when dividing a million digit number by a half-million digit number):

>>> time_division(int, 2.0, 20)

size old time new time faster correct

------------------------------------------------------------

16 0.000008 0.000052 0.155080 True

32 0.000005 0.000052 0.102151 True

64 0.000008 0.000059 0.132075 True

128 0.000013 0.000070 0.190476 True

256 0.000035 0.000107 0.325521 True

512 0.000126 0.000215 0.583658 True

1024 0.000431 0.000532 0.810399 True

2048 0.001855 0.001552 1.195104 True

4096 0.007154 0.005050 1.416708 True

8192 0.028505 0.015449 1.845033 True

16384 0.111193 0.046938 2.368925 True

32768 0.443435 0.142551 3.110706 True

65536 1.778292 0.432412 4.112497 True

131072 7.110184 1.305771 5.445200 True

262144 28.596926 3.919399 7.296253 True

524288 116.069764 11.804032 9.833061 True

As a bonus, a fast divmod also provides a fast way to convert long integers to decimal strings, by recursively splitting

*n*in half using

*L*,

*R*= divmod(

*n*, 10

^{b/2}) where

*b*is the number of digits in

*n*:

>>> time_str(int, 20)

size old time new time faster correct

------------------------------------------------------------

16 0.000005 0.000013 0.413043 True

32 0.000006 0.000009 0.588235 True

64 0.000008 0.000012 0.674419 True

128 0.000020 0.000023 0.865854 True

256 0.000059 0.000133 0.442105 True

512 0.000204 0.000333 0.613255 True

1024 0.000895 0.001194 0.749708 True

2048 0.003505 0.002252 1.556824 True

4096 0.013645 0.006600 2.067429 True

8192 0.052386 0.018334 2.857358 True

16384 0.209164 0.052233 4.004412 True

32768 0.834201 0.153238 5.443827 True

65536 3.339629 0.450897 7.406639 True

131072 13.372223 1.339044 9.986392 True

262144 53.547894 3.998352 13.392491 True

524288 214.847486 11.966933 17.953429 True

So printing an integer with half a million digits takes 12 seconds instead of 3.5 minutes. This can be quite a usability improvement.

The code can be found in this file: div.py.

I have now submitted a request for these algorithms to be implemented in the Python core. Since the pure Python implementation is very simple, I don't think porting it to C would be all that difficult. By "request", I mean that I might eventually do it myself, but there are many others who are much more familiar with both the Python codebase and the C language, and if any of those persons happens to have a few free hours, they could certainly do it both faster and better. If you are interested in helping out with this, please post to the issue tracker.

The division code is just one of several small projects I've been working on lately. Basically, I've found that there are some arithmetic functions that are needed very frequently in all kinds of mathematical code. These include:

- Pure convenience functions like
`sign`,`product`, ... - Bit counting
- Radix conversion
- Integer square roots
- Rational (and complex rational) arithmetic
- Integer factorization, multiplicity tests, gcd, etc
- Factorials, binomial coefficients, etc

Such utility functions are currently scattered throughout mpmath, SymPy and SympyCore codebases. Many of them are hack-ish, duplicates, and/or don't always work/have very poor worst-case performance. Right now, I'm trying to collect them into a single module and optimizing / strain-hardening them.

The current result of this effort can be found here (be aware that it's very much a work in progress). Among other things, it includes the mentioned fast division code, fast square root code based on my implementation from mpmath, much needed improvements to the nth root and integer factorization code I wrote for SymPy, plus the extremely fast multinomial coefficient code optimized by Pearu and myself for SympyCore (which, if I correctly recall the results of previous benchmarking, makes polynomial powering in SympyCore faster than Singular).

My plan is to split this off to a standalone project, as it could be useful to other people as such, but keeping it a single, self-contained .py file that is easy to include in mpmath and SymPy. There won't be too much feature creep (hopefully); the advanced number theory functions in SymPy won't move, nor will the floating-point functions from mpmath.

Finally, I have done a bit more work on the adaptive numerical evaluation code for SymPy. The main new features are support for converting approximate zeros to exact zeros (thereby preventing the algorithm from hanging when it encounters a nonsimplified zero expression, and overall prettifying output), and support for logarithms and powers/exponentials of complex numbers. See evalf.py and test_evalf.py. I've also done miscellaneous other work on SymPy, such as patching the existing evalf code to use mpmath and getting rid of the global precision.