armchair_progamer

joined 1 year ago
MODERATOR OF
 
 

Excerpt:

This is a reconstruction -- extracted and very lightly edited -- of the "prehistory" of Rust development, when it was a personal project between 2006-2009 and, after late 2009, a Mozilla project conducted in private.

The purposes of publishing this now are:

  • It might encourage someone with a hobby project to persevere
  • It might be of interest to language or CS researchers someday
  • It might settle some arguments / disputed historical claims

Rust started being developed 18 years ago. This is how it looked until 14 years ago, where I believe the rest of development is on rust-lang/rust. The first Rust program looks completely different than the Rust we know today.

 

GitHub

Haxe-based language for defining 2D shmups bullet-hell patterns.

The VM also runs in Haxe.

5
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]
 

Soundly handling linearity requires special care in the presence of effect handlers, as the programmer may inadvertently compromise the integrity of a linear resource. For instance, duplicating a continuation that closes over a resource can lead to the internal state of the resource being corrupted or discarding the continuation can lead to resource leakage. Thus a naïve combination of linear resources and effect handlers yields an unsound system.

...

In the remainder of this blog post we describe a novel approach to rule out such soundness bugs by tracking control-flow linearity, a means to statically assure how often a continuation may be invoked which mediates between linear resources and effectful operations in order to ensure that effect handlers cannot violate linearity constraints on resources. We focus on our implementation in Links. The full technical details are available in our open access POPL'24 distinguished paper Soundly Handling Linearity.

5
Syndicated Actors (syndicate-lang.org)
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]
 

Programming models like the Actor model and the Tuplespace model make great strides toward simplifying programs that communicate. However, a few key difficulties remain.

The Syndicated Actor model addresses these difficulties. It is closely related to both Actors and Tuplespaces, but builds on a different underlying primitive: eventually-consistent replication of state among actors. Its design also draws on widely deployed but informal ideas like publish/subscribe messaging.

For reference, actors and tuple-spaces are means to implement concurrent programs. Actors are essentially tiny programs/processes that send (push) messages to each other, while a tuple-space is a shared repository of data ("tuples") that can be accessed (pulled) by different processes (e.g. actors).

...

A handful of Domain-Specific Language (DSL) constructs, together dubbed Syndicate, expose the primitives of the Syndicated Actor model, the features of dataspaces, and the concepts of conversational concurrency to the programmer in an ergonomic way.

...

To give some of the flavour of working with Syndicate DSL constructs, here's a program written in JavaScript extended with Syndicate constructs:

function chat(initialNickname, sharedDataspace, stdin) {
  spawn 'chat-client' {
    field nickName = initialNickname;

    at sharedDataspace assert Present(this.nickname);
    during sharedDataspace asserted Present($who) {
      on start console.log(`${who} arrived`);
      on stop  console.log(`${who} left`);
      on sharedDataspace message Says(who, $what) {
        console.log(`${who}: ${what}`);
      }
    }

    on stdin message Line($text) {
      if (text.startsWith('/nick ')) {
        this.nickname = text.slice(6);
      } else {
        send sharedDataspace message Says(this.nickname, text);
      }
    }
  }
}

Documentation

Comparison with other programming models

History

Author's thesis

29
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]
 

Even though it's very unlikely to become popular (and if so, it will probably take a while), there's a lot you learn from creating a programming language that applies to other areas of software development. Plus, it's fun!

[–] [email protected] 15 points 1 week ago* (last edited 1 week ago) (1 children)

The Tetris design system:

Write code, delete most of it, write more code, delete more of it, repeat until you have a towering abomination, ship to client.

 

The blog post is the author's impressions of Gleam after it released version 1.4.0. Gleam is an upcoming language that is getting a lot of highly-ranked articles.

It runs on the Erlang virtual machine (BEAM), making it great for distributed programs and a competitor to Elixir and Erlang (the language). It also compiles to JavaScript, making it a competitor to TypeScript.

But unlike Elixir, Erlang, and TypeScript, it's strongly typed (not just gradually typed). It has "functional" concepts like algebraic data types, immutable values, and first-class functions. The syntax is modeled after Rust and its tutorial is modeled after Go's. Lastly, it has a very large community.

 

Zyme is an esoteric language for genetic programming: creating computer programs by means of natural selection.

For successful evolution mutations must generate a wide range of phenotypic variation, a feat nearly impossible when randomly modifying the code of conventional programming languages. Zyme is designed to maximize the likelihood of a bytecode mutation creating a novel yet non-fatal change in program behavior.

Diverging from conventional register or stack-based architectures, Zyme uses a unique molecular automaton-based virtual machine, mimicking an abstract cellular metabolism. This design enables fuzzy control flow and precludes invalid runtime states, transforming potential crashes into opportunities for adaptation.

Very unique, even for an esoteric language. Imagine a program that gets put through natural selection and "evolves" like a species: the program is cloned many times, each clone is slightly mutated, the clones that don't perform as well on some metric are discarded, and the process is repeated; until eventually you have programs that do great on the metric, that you didn't write.

 

Key excerpt:

At OOPSLA 2020, Prof. Dietrich Geisler published a paper about geometry bugs and a type system that can catch them. The idea hasn't exactly taken over the world, and I wish it would. The paper's core insight is that, to do a good job with this kind of type system, you need your types to encode three pieces of information:

  • the reference frame (like model, world, or view space)
  • the coordinate scheme (like Cartesian, homogeneous, or polar coordinates)
  • the geometric object (like positions and directions)

In Dietrich's language, these types are spelled scheme<frame>.object. Dietrich implemented these types in a language called Gator with help from Irene Yoon, Aditi Kabra, Horace He, and Yinnon Sanders. With a few helper functions, you can get Gator to help you catch all the geometric pitfalls we saw in this post.

 

Abstract:

File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.

[–] [email protected] 126 points 1 week ago (2 children)

But is it rewritten in Rust?

[–] [email protected] 18 points 2 weeks ago* (last edited 2 weeks ago) (1 children)
5
submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]
 

Associated class (Brown University cs1951x)

This is a 209-page book on logical verification in Lean (4.0, the "new" version), available as a PDF.

 

I created Bril, the Big Red Intermediate Language, to support the class's implementation projects. Bril isn't very interesting from a compiler engineering perspective, but I think it's pretty good for the specific use case of teaching compilers classes. Here's a factorial program:

@main(input: int) {
  res: int = call @fact input;
  print res;
}

@fact(n: int): int {
  one: int = const 1;
  cond: bool = le n one;
  br cond .then .else;
.then:
  ret one;
.else:
  decr: int = sub n one;
  rec: int = call @fact decr;
  prod: int = mul n rec;
  ret prod;
}

Bril is the only compiler IL I know of that is specifically designed for education. Focusing on teaching means that Bril prioritizes these goals:

  • It is fast to get started working with the IL.
  • It is easy to mix and match components that work with the IL, including things that fellow students write.
  • The semantics are simple, without too many distractions.
  • The syntax is ruthlessly regular.

Bril is different from other ILs because it prioritizes those goals above other, more typical ones: code size, compiler speed, and performance of the generated code.

Aside from that inversion of priorities, Bril looks a lot like any other modern compiler IL. It's an instruction-based, assembly-like, typed, ANF language. There's a quote from why the lucky stiff where he introduces Camping, the original web microframework, as "a little white blood cell in the vein of Rails." If LLVM is an entire circulatory system, Bril is a single blood cell.

Reference

GitHub

[–] [email protected] 79 points 1 month ago (2 children)

“I’ve got 10 years of googling experience”.

“Sorry, we only accept candidates with 12 years of googling experience”.

[–] [email protected] 3 points 3 months ago

Author's comment on lobste.rs:

Yes it’s embeddable. There’s a C ABI compatible API similar to what lua provides.

[–] [email protected] 29 points 3 months ago* (last edited 3 months ago)

C++’s mascot is an obese sick rat with a missing foot*, because it has 1000+ line compiler errors (the stress makes you overeat and damages your immune system) and footguns.

EDIT: Source (I didn't make up the C++ part)

[–] [email protected] 7 points 3 months ago* (last edited 3 months ago) (2 children)

I could understand method = associated function whose first parameter is named self, so it can be called like self.foo(…). This would mean functions like Vec::new aren’t methods. But the author’s requirement also excludes functions that take generic arguments like Extend::extend.

However, even the above definition gives old terminology new meaning. In traditionally OOP languages, all functions in a class are considered methods, those only callable from an instance are “instance methods”, while the others are “static methods”. So translating OOP terminology into Rust, all associated functions are still considered methods, and those with/without method call syntax are instance/static methods.

Unfortunately I think that some people misuse “method” to only refer to “instance method”, even in the OOP languages, so to be 100% unambiguous the terms have to be:

  • Associated function: function in an impl block.
  • Static method: associated function whose first argument isn’t self (even if it takes Self under a different name, like Box::leak).
  • Instance method: associated function whose first argument is self, so it can be called like self.foo(…).
  • Object-safe method: a method callable from a trait object.
[–] [email protected] 9 points 4 months ago* (last edited 4 months ago)

Java the language, in human form.

[–] [email protected] 3 points 4 months ago* (last edited 4 months ago) (1 children)

I find writing the parser by hand (recursive descent) to be easiest. Sometimes I use a lexer generator, or if there isn’t a good one (idk for Scheme), write the lexer by hand as well. Define a few helper functions and macros to remove most of the boilerplate (you really benefit from Scheme here), and you almost end up writing the rules out directly.

Yes, you need to manually implement choice and figure out what/when to lookahead. Yes, the final parser won’t be as readable as a BNF specification. But I find writing a hand-rolled parser generator for most grammars, even with the manual choice and lookahead, is surprisingly easy and fast.

The problem with parser generators is that, when they work they work well, but when they don’t work (fail to generate, the generated parser tries to parse the wrong node, the generated parser is very inefficient) it can be really hard to figure out why. A hand-rolled parser is much easier to debug, so when your grammar inevitably has problems, it ends up taking less time in total to go from spec to working hand-rolled vs. spec to working parser-generator-generated.

The hand-rolled rules may look something like (with user-defined macros and functions define-parse, parse, peek, next, and some simple rules like con-id and name-id as individual tokens):

; pattern			::= [ con-id ] factor "begin" expr-list "end"
(define-parse pattern
  (mk-pattern
    (if (con-id? (peek)) (next))
    (parse factor)
    (do (parse “begin”) (parse expr-list) (parse “end”))))

; factor 			::= name-id 
; 			 | symbol-literal
; 			 | long-name-id 
; 			 | numeric-literal 
;	 		 | text-literal 
; 			 | list-literal 
; 			 | function-lambda 
; 			 | tacit-arg
; 			 | '(' expr ')' 
(define-parse factor
  (case (peek)
    [name-id? (if (= “.” (peek2)) (mk-long-name-id …) (next))]
    [literal? (next)]
    …))

Since you’re using Scheme, you can almost certainly optimize the above to reduce even more boilerplate.

Regarding LLMs: if you start to write the parser with the EBNF comments above each rule like above, you can paste the EBNF in and Copilot will generate rules for you. Alternatively, you can feed a couple EBNF/code examples to ChatGPT and ask it to generate more.

In both cases the AI will probably make mistakes on tricky cases, but that’s practically inevitable. An LLM implemented in an error-proof code synthesis engine would be a breakthrough; and while there are techniques like fine-tuning, I suspect they wouldn’t improve the accuracy much, and certainly would amount to more effort in total (in fact most LLM “applications” are just a custom prompt on plain ChatGPT or another baseline model).

view more: next ›