[Ed: This post is about real work, not just my regular mumblings.]
I was delighted to find that doxygen, one of the world’s best code documentation tools, now supports Fortran 90.1 This now means that a very large body of scientific code out there can now be documented with the same awesome tools as many excellent codes out there, e.g. VTK.
I checked out doxygen version 1.5.5 and managed to get it to work on our lab group’s codebase: about 27,000 lines of Fortran with some Python glue.
Fortran support is clearly preliminary - the documentation doesn’t specify the syntax, and it took reading the lexer grammar file and some trial-and-error to figure it out, but it was worth it in the end.
If anyone out there is interested, here’s the summary of what I’ve found out:
Getting doxygen 1.5.5 to work with fortran code
- Patch the 1.5.5 release source code as described here to avoid crazy parsers errors with multi-line loops and if statements and things like that.
- If you’ve named your Fortran source files with coco with the .coco extension, or interface files with the .int extenstion, add these lines to initDoxygen() (near line 8930) in doxygen.cpp:
Doxygen::parserManager->registerParser(".int", new FortranLanguageScanner); Doxygen::parserManager->registerParser(".coco", new FortranLanguageScanner); - Compile, install, etc.
- Document the files like this for multiline headers
!> A constant function !! !< function f(x) x = 1 end function
- One-line documentation looks like this:
real (kind=8) :: pi = 3.1415926d0 !< A good enough approximation
- In Doxyfile, add *.int (and possibly *.coco) to FILE_PATTERNS for interface files and coco.
Stuff that still doesn’t work
- The doxygen parser sporadically chokes on files with multiple comment indicators. In Fortran, lines beginning with ‘*’, ‘c’ or ‘!’ are all considered valid comments. However, if more than one of these characters are used in the same file, the parser sometimes dies with the error message “Error: EOF reached in wrong state (end missing)”. I haven’t figured out why this happens, but in all the cases where this error occurred, I managed to fix it by standardizing all comments to ‘!’. However, there are some files with mixed comment indicators that parse fine. Go figure.
- Doxygen does not correctly process function or subroutine headers with continuation symbols, e.g.
function f ( real x, & real y)or
function f ( real x, real & y)It will think there is a symbol called real& or &y something like that and throw an ‘unknown variable’ warning.
- Doxygen sometimes gives “Found unknown command: \todo” or “Found unknown command: @todo” warnings in documentation blocks with the \todo command. I think the syntax is correct though, and it doesn’t happen all the time.
- Doxygen appears to not recognize out-of-place documentation blocks for variables, e.g.
!> !! \var int f !! f is zero. !< int f = 0 - With code like this, doxygen gives a “member with no name found” warning. I’m not sure if this is because I have the wrong syntax.
- Doxygen does not respect the case-insensitive nature of Fortran, e.g.
!> !! \param x argument of function !< function f(X) real xWith this code, doxygen returns a “argument `x’ of command \param is not found in the argument list of f(X)” warning.
- I can’t get multi-line in-code documentation blocks to work, e.g.
int x !< a !< variable
It would also be great if doxygen did some other things, like recognizing parameters to data type declarations (e.g. optional, parameter, intent(inout), allocatable), but overall I’m just happy right now to be able to use doxygen for my Fortran code.
Footnotes- Some of you will doubtless say - what? you’re still using Fortran!? but it’s the de facto standard in the scientific computing world. For many reasons - a large existing codebase and the incredible pain in interfacing Fortran and C code being the top two - Fortran is still alive and kicking.↩
I personally much prefer Fortran to C. Too bad almost no one outside the physical sciences uses it.
For scientific usage, I’d rather use interpreted languages designed for such a purpose. e.g. matlab, maple, octave, scilab etc. instead of using C.
see http://en.wikipedia.org/wiki/Interpreted_language for more info.
Interfacing such languages with fortran is quite a breeze actually. For example some of the commonly used functions in scilab are not coded in the built-in interpreter but actually interface compiled code from fortran sources.
Anyway, back to the main gist of your article… From my experience, most code (scientific or otherwise) isn’t written with helpful documentation, so without a lot of additional eye/brain/typing power, there’s not much doxygen can do.
It is well known that it’s one thing to write/debug your own code, but understanding/debugging other people’s code (especially those undocumented ones) is something on a totally different level.
sepethrea:
I believe you’re the first blood elf to visit my realm. Welcome!
As for your comment about interpreted languages, they’re great for testing out new algorithms and new methods on toy systems. Python with scipy is my favorite.
However, environments like Matlab are so incredibly inefficient that for any serious application, one most certainly needs to write at least the time-critical kernels in a compiled language.
A good chuck of the slowness can be traced back to the need for Matlab etc. to be generic, i.e. their routines must work for the most general possible case, and there’s a lot of overhead needed to figure that out. However, code written yourself can sidestep most (if not all) of that overhead if you know your application well enough.
For example, if you want to solve Ax = b for a positive definite, nonsingular, symmetric matrix A, it’s a no-brainer to go for something like Gaussian elimination or conjugate gradients to attack the problem; however if you had to implement the most generic solver, one basically has to resort to something like SVD which is MUCH more expensive. Smarter codes can try to figure out when to apply which algorithm, but that’s still overhead that can be avoided.
I agree that most code out there is poorly documented, but that’s no reason to do the same with one’s own code.
Also, contrary to your claim, doxygen has incredibly powerful features for undocumented code. For example, it is smart enough to compile lists of classes, modules, functions and other structured objects in code. And the killer feature IMO are the call graph and caller graph features, which allow you to look at the control flow and data flow of the program without a single line of comments. Yes, I grant that such tools will be of not much help if the functions are called things like X and NewX and NewTmpX4, but it’s a lot better than trying to figure it out without doxygen. That alone is sufficient reason for me to be really happy about it.