Rust Refactoring and Syntax


Programming

I'm a fan of the Rust programming language's semantics, high-level design, and borrowing-based memory safety. Its syntax design however has many rough edges. I suspect this is a side-effect of the churn that the syntax underwent in its early years, where several alternatives were tried and discarded in succession. This has resulted in many small inconsistencies in the language syntax that are individually unimportant quirks, but added up together result in a language that is unecessarily difficult to use.

This is very noticeable when performing refactoring. For example, consider the editing process of extracting chunks of related local state stored on some function's call stack into a dedicated struct. This is a common pattern used to make code more modular, reusable, and encapsulated.

First, let's compare how this refactoring happens in a language like C#. Imagine starting off with some function like this:

void Foo()
{
       int barId = 0;
       string barName = "blah!"; 
       // ...
}

You realise that you actually need two or more “bar” things, so in a moment of weakness you are tempted to write code like the following:

void Foo()
{
    int bar1Id = 0;
    string bar1Name = "blah!";
    int bar2Id = 0;
    string bar2Name = "blah!";
}

Using repeated ordinals in identifiers like that is a code smell, so you extract that snippet into a class, which makes the code in the Foo() function nice and neat again:

class Bar 
{
    int Id = 0;
    string Name = "blah!";
}

void Foo()
{
    Bar bar1 = default, bar2 = default;
    // or:
    var bars = new Bar[] { default, default };
}

In C#, you can literally cut & paste chunks of code like that out of a method into a class definition, and it’ll just work. The only additional effort required is to extract the associated code, but that's given since the point of this exercise is to improve encapsulation.

Contrast this with Rust, starting off with the equivalent snippet:

fn foo() {
    let bar_id = 0;
    let bar_name = "blah!";
}

Naively, you might try to cut & paste into a struct:

struct Bar {
    let id = 0;
    let name = "blah!";
}

Unfortunately, this fails to compile with a litany of errors. Fixing this takes a surprising number of steps:

Step #1) Remove the let keyword, because it's not used in structs.

struct Bar {
    id = 0;
    name = "blah!";
}

Step #2) Replace semicolons with commas, because Rust is different to every C-like language in that it uses a different terminator in structs:

struct Bar {
    id = 0,
    name = "blah!" 
}

Step #3) Add types, which were inferred in the function body, but not int structs:

struct Bar {
   id: usize = 0,
   name: String = "blah!"
}

Step #4) This still won’t compile because Rust doesn’t allow inline defaults! You must then add an impl block with an explicit constructor method:

struct Bar {
   id: usize,
   name: String
}

impl Bar {
   pub fn new() -> Bar {
       Bar {
           id: 0,        // Have to replace '=' with ':' here!
           name: "blah!" // Quoted text can't init a String!
       }
   }
}

Step #5) Note that there are four repetitions of the identifier Bar already, and we're not done yet:

Default values cannot be used in functions parameters, so several variants of new() have to be coded up manually unless you're lucky and the auto-generated Default impl suffices. It doesn't in this example.

Secondly, if you want to really fix the string handling, you have to either allocate on the heap with "blah!".to_string(), or create a templated version of your struct to handle any arbitrary string-like types including references.

This will require much longer code with even more repetions of the struct name:

struct Bar<S:AsRef<str>> {
    pub id: usize,
    pub name: S
}

impl<S: AsRef<str>> Bar<S> {

    pub fn new() -> Self {
        Self::new_named( 0, "test" )
    }
    pub fn new_named( id: usize, name: S ) -> Bar<S> {
        Bar {
            id,
            name
        }
    }
}

impl<S: AsRef<str>> Default for Bar<S>  {
    fn default() -> Self { Self::new() }
}

fn foo() {
    let bar1: Bar<&str> = Bar::new_named(0, "test");
    let bar2: Bar<&str> = Bar::default();
}

Worse still, when refactoring the associated code into structs, every single member access requires the self. prefix, because it is not implicit, unlike most object oriented languages.

So for example this snippet:

if id < 0 { 
    name = "invalid";
}

Would have to have two self prefixes added when moved into an impl block during the refactoring:

fn validate( &mut self ) {}
    if self.id < 0 { 
        self.name = "invalid";
    }
}

This is particularly annoying in code that naturally has short member names, such as 3D maths vectors with X, Y, and Z coordinates, or colors with R, G, B values, etc… The self keywords littered all over the place are just visual noise and do not add information useful to the reader. A syntax highlighting IDE would be the far superior approach for highlighting the difference between members and non-members.

This is particularly irritating when considering that function parameters are often promoted to fields or demote to parameters. For example, a “context” of some sort may be better associated with the struct instead of repeated through the parameter set of every single associated function. This move is trivial in languages like Java, C++, or C#, but requires fiddly code edits to add or remove the self. prefixes.

To summarise some of the issues:

  • There's two ways of listing the type parameters: <S: AsRef<str>> and <S>.
  • Both the struct name and template parameters are repeated about half a dozen times each, even for such a trivial type.
  • The struct also refers to its own type as Self, but only in some scenarios, some of which are required and some that are optional. You can use Bar<S> too, again breaking some naive search & replace refactorings.
  • Nearly twenty lines of code are requried for something that is just four in other mainstream languages. That's not only 5x more verbose, it is most likely more verbose than the original repetitive code this was aiming to clean up and simplify.
  • This trivial refactoring requires just about every moved line of code to be modified.
  • Complex lifetime annotations make this even worse.
  • If moving self-referential data, then heaven help you, because now you're in for a bunch of “fun” with the std::Pin<_> type.

To be fair, IDE tooling will eventually make much of this pain go away, but as far as I know there are none currently available that are capable of this particular refactoring.