No consistent coding style applied nor guidance document available

Issue #52 new
Stephen Biggs-Fox created an issue

The gs2 code-base does not have a consistent coding style applied to it. Also, there is no document with coding-style guidance in it for developers to follow when modifying code. Therefore, I propose that resolution of this issue will add such style guidance to the CONTRIBUTING document and produce some tools to help developers apply the style consistently. Furthermore, I propose that source files have the newly defined coding style applied to them when they are changed as part of future pull requests (rather than doing a blanket application now as that is likely to cause lots of merge conflicts with almost every branch, which is a headache we don't need). Finally, I have already taken some time to develop some tools, work-flows and style guidelines in this area so I have assigned this issue to me.

Comments (15)

  1. Joseph Parker

    Thanks Steve - yes, this is something we should do. I can access to similar documents for my colleagues' codes... though some of these are thesis-length! I think we should aim for something similar to BOUT++'s guidance. Enough detail, but short enough to read.

    We can also set up default text in pull requests to prompt people to check their style.

    But I think the most important thing is to get the existing code in the style we want it. Ultimately, people mostly adopt the style of whatever code they're altering.

  2. Stephen Biggs-Fox reporter

    OK, thanks for that @josephparker

    The BOUT++ one looks ideal - I think I will even copy-paste that as a starting point and edit as required.

    @josephparker @ZedThree @daviddickinson Can we have a discussion here about what to put in before I do it. My current plan is this:

    • Brief preamble of why coding standards are important
    • License and copyright information at the top of every file.
      • Provide script to do this automatically.
      • This could be done for all files now as it is unlikely to create merge conflicts.
      • Can we run the script to update automatically (if necessary) on push?
    • Implicit None
      • Once per program / module or once per procedure?
      • Currently inconsistently applied: 62% of source files have more than 1 implicit none statement, 26% have only 1 implicit none (and 12% have zero!). Very roughly, about 75% of procedures have implicit none.
      • I think once per procedure is unnecessary extra code that reduces readability (increases cognitive load - what is this line? Oh, it's just implicit none so I can ignore that) so we should go for once per program / module
      • The chosen style can be applied consistently via a script. This ensures all programs / modules have implicit none and thus removes the need for redundant extra statements in procedures as a backup.
      • This can probably be done to all files now as any merge conflicts will be easy to solve.
      • Can we run the script to check for consistency and update automatically (if necessary) on push?
    • Indentation, Whitespace and Continuation Lines
      • Very inconsistently applied across the project. Some files are not even nearly indented "correctly" (i.e. in a way that helps with readability)..
      • Easiest way to apply this consistently is to use fprettify (python package). From playing around with it, running it like this fprettify -i 2 -w 4 seems to cause the fewest changes to the source code. This results in indentation of 2 spaces per nested block and white-space around operators, print/read/write, plus/minus, multiply/divide and type component selector. This also looks pretty neat.
      • fprettify struggles with long lines and goto statements - affected files will have to have long lines broken up and goto statements removed / replaced before fprettify can be applied
      • I don't know what to recommend for continuation lines.
        • Should we prefer continuation lines or new lines, e.g. a use statement with a long list of stuff in the only section - should it be one use statement that continues over many lines or should we just start a new use statement? Probably the former if I had to choose.
        • Should the continued line start with & or not? Might be clearer to show that it is a continued line but not sure what is done in the code currently.
        • Where should a line break? 132 is the maximum permitted by the standard, although we allow longer lines via compiler options. Do we want to break earlier than 132 to give a safety margin? Not sure what's currently done.
      • This should probably be done to files as we go along rather than in a big batch now as it is likely to make every line a merge conflict that is not easy to see the fix.
      • Can we run the script to apply this automatically and update the repo (if necessary) on push?
    • Unit tests
      • People should write them
    • Documentation
      • Currently inconsistent throughout the code-base but David is trying to get it in order.
      • Built using FORD configured to recognise doxygen format. Maybe David can provide some guidance on the correct way to do it.
      • Can we re-build the docs automatically on merge of PRs into master?
    • Naming
      • Fortran is case insensitive so everything should be lower case / snake case
      • Other guidance as per BOUT++ (e.g. prefer a longer descriptive name over a shorter abbreviated one)
      • Consistent application of lower case can be done by a script. This is probably best done as things are changed rather than all at once as it could cause lots of annoying merge conflicts.
      • Can we run the script to apply this automatically and update the repo (if necessary) on push?
    • Philosophy / culture / how do you want to word this subtitle?
      • Procedures that are missing tests / documentation and files that are missing prettification should be fixed as and when they are changed, i.e. people should not try to fix everything at once.
  3. David Dickinson

    I'm not sure Provide script to do this automatically. is likely to be possible (without writing a monster processing/parsing script), although it would be great if we could.

  4. David Dickinson

    Built using FORD configured to recognise doxygen format. Maybe David can provide some guidance on the correct way to do it. see recent commits merged into next.

  5. David Dickinson

    Can we re-build the docs automatically on merge of PRs into master?

    Yes but that's not helpful until we find a place to host these (they're quite large)!

    Fortran is case insensitive so everything should be lower case / snake case
    

    Being case insensitive doesn't mean these are the only options though

  6. Stephen Biggs-Fox reporter

    "I'm not sure Provide script to do this automatically. is likely to be possible"

    For which one? Some are easier than others.

  7. Stephen Biggs-Fox reporter

    "Can we re-build the docs automatically on merge of PRs into master?

    Yes but that's not helpful until we find a place to host these (they're quite large)!"

    What do other projects do? What about github pages? What about readthedocs? Can we mirror the gs2 repo somewhere so we get those automatically?

  8. Stephen Biggs-Fox reporter

    "Fortran is case insensitive so everything should be lower case / snake case

    Being case insensitive doesn't mean these are the only options though"

    What I mean is, camelCase or PascalCase doesn't make sense in a case insensitive system. Also, the majority of the code-base is lower snake case already. What do you suggest we use?

  9. David Dickinson

    For which one? Some are easier than others.

    I thought the proposal was to provide a single script that enforces the entire style guide.

    What do other projects do? What about github pages? What about readthedocs? Can we mirror the gs2 repo somewhere so we get those automatically?
    

    Currently I'm hosting a trial version at https://www-users.york.ac.uk/~dd502/gs2Docs/index.html I think a lot of other projects present more of a manual built out of simple pages (/tex) rather than hosted full source code documentation (of course there's a mix of examples). Readthedocs is a possibility but I suspect it might be a bit tricky to get our existing documentation into that form (it's on my todo list).

    What I mean is, camelCase or PascalCase doesn't make sense in a case insensitive system. Also, the majority of the code-base is lower snake case already. What do you suggest we use?
    

    I'm not saying lower snake case isn't a good choice just that you can decide to use camelCase as the style if you want -- it just means a style failure isn't something that prevents compilation. I'm generally happy with lower snake case.

  10. Stephen Biggs-Fox reporter

    "I thought the proposal was to provide a single script that enforces the entire style guide."

    I was imagining a collection of scripts where each one does one specific aspect (e.g. one for copyright, one for implicit none, etc). But yes, you could easily wrap the collection in a master script that calls each one so there is one script to enforce all of the easily automated bits of the style guide. I accept that some aspects may be harder to automate than others but, of the ones I have indicated above (copyright+license, implicit none, indentation/whitespace, lower-case), which ones are you not sure are likely to be possible?

  11. David Dickinson

    For example if the style was to enforce implicit none in every routine, that's going to be tricky to automate in general (even for the module level that might be hard to automate). Changing to lower case has to respect comments (i.e. we probably don't want to be forcing comments to be lower case), we can detect camel case and change to snake case but this has to be done before we enforce lower case etc. It's not impossible I can just imagine that it might become quite complicated and difficult to maintain. We can do it in BOUT++ because clang-format exists.

  12. Stephen Biggs-Fox reporter

    "I think a lot of other projects present more of a manual built out of simple pages (/tex) rather than hosted full source code documentation"

    I'd say both are useful for different audiences. A LaTeX manual is aimed at users, while source code documentation is more for developers (I'm aware that for gs2 most people are both). Therefore, it may be sufficient to just say that the source code documentation is built by developers and viewed locally (i.e. not officially hosted at all - although might be worth keeping the one on www-users.york.ac.uk/~dd502 as an unofficial snapshot of e.g. the latest release). But we should probably host a LaTeX user guide somewhere.

    In terms of this PR, the original question of "can we build the docs automatically on merge to master?"... I guess the answer in the short term is no. Depending on what we decide for the above, it might be possible to do something but that's further down the line so let's ignore that for now.

  13. David Dickinson

    Sure -- note "But we should probably host a LaTeX user guide somewhere." -- actually I think if we can write the user manual in markdown then this might give us a bit more flexibility in terms of what we do with it.

  14. Stephen Biggs-Fox reporter

    I think enforcing implicit none is fairly straightforward, especially if it's one per module: Delete all implicit none statements, match the pattern for the start of a program / module, skip any use lines, insert implicit none. It's a few lines of sed or similar.

    Lower case might be more tricky. Respecting comments is probably OK but trying to convert camel case to snake case could be messy - I hadn't thought of that. Maybe we can leave that one for now.

  15. Stephen Biggs-Fox reporter

    "if we can write the user manual in markdown then this might give us a bit more flexibility"

    OK. Let's do that (as a separate issue).

  16. Log in to comment