Refactor Syscall Creation Script

Project description

The current FreeBSD system call creation script, sys/tools/makesyscalls.lua, was implemented by Kyle Evans and iterated on by Brooks Davis. Its purpose is to streamline the introduction of system calls into the FreeBSD kernel. makesyscalls.lua is a transliteration of FreeBSD’s original makesyscalls.awk script from awk to Lua. It’s a monolithic script that has kept much of the procedural-oriented awk code. It does not take full advantage of modern features of Lua (object-oriented design) and has been difficult to add additional features to.

FreeBSD system call creation will be further streamlined by a complete refactor of makesyscalls.lua, to have easily extensible objects and dynamically-called modules. The outcome is to construct a new library out of makesyscalls.lua: Lua Syscall Generator. Driving design goals are it will be (1) easier to extend, (2) more approachable, and (3) more flexible to fit different needs. Remnants of the original awk way of doing things will be cleaned up into modern design paradigms. Lua modules and "classes" will provide better namespacing of globals, dynamic generation of output files, and decoupling of the original code. After refactoring, the previous functionality of makesyscalls.lua can be achieved simply by calling the necessary modules. By taking advantage of modern Lua features and object-oriented design, makesyscalls.lua will be a new library of useful tools for kernel developers to build upon.

The benefit to FreeBSD is to further streamline the creation of system calls (adding even more guardrails than makesyscalls.lua already had), a more maintainable interface, and a strong foundation to build upon in the development of future system call creation tools. There is a clear intent to expand on the work of Kyle Evans and Brooks Davis, and my refactor will address that intent and provide an extensible interface to do so. It will further demystify the process of system call creation and allow others to more easily contribute.

Warner Losh has done preliminary work on refactoring makesyscalls.lua, which will serve as the basis for my refactor. It is unfinished and does not incorporate Brooks Davis' recent commits. A successful project will be finishing the pre-established work, incorporating recent commits, and the previously stated design outcomes. There is still much work to be done and critical design choices to be made (e.g., is it better to have local write procedures or a class interface?). Uncoupling the procedural code in a well-thought-out and extensible way is the major obstacle and motivation for the project.

Approach to solving the problem

Currently:
process_args():

process_syscall_def = λ:

loop write_line() procedure of producing auto-generated files:

How to solve:

More solutions will be decided upon in the refactor, as it's important to decide on a final interface before setting final solutions

Deliverables

  1. System call creation will work as before

  2. makesyscalls.lua is refactored into core, modules, and classes

  3. System call creation library is easily extensible (It should provide a basis for future system call creation scripts)

  4. Well-documented (e.g., "bsd_foo will be generated", how to opt-out of complex generation, etc.)

Milestones


Test Plan

Lua is an interpreted scripting language, meaning two great things for testing: rapid iteration and a REPL. There are a lot of moving pieces to makesyscall.lua. pcall() and assert() may be used liberally. print() all write procedures before finally changing to io.write(), and cat to a temporary file for visual comparison. Methods will be built up piecemeal, unit testing correct behavior and outputs of the methods. makesyscalls.lua currently functions correctly, so certain assumptions of Lua snippets will be made (e.g., isptrtype() will be assumed to work when refactored), though unit testing will still be employed to build up methods. Potentially, third-party and libraries may also be employed.

Being a refactor, regression testing will be the primary testing procedure.
Due to refactoring limiting testing, an extended period of live regression testing will occur after my refactor has been completed. I will patch as necessary, until system call creation works as before.

The Code

Code:

Refactor makesyscalls.lua #1362

Proposal:

proposal.pdf

Class Design Diagram:

class-design-diagram.pdf

Outcomes

Project Refactor Syscall Creation Script, now formally called Lua Syscall Generator (LSG), is a refactor of FreeBSD's makesyscalls.lua as a library.

  1. To meet the deliverable that system call creation will work as before:

lsg/src $ /usr/libexec/flua freebsd.lua ../sys/kern/syscalls.master

Will generate and perform as expected to makesyscalls.lua

  1. To meet the deliverable that makesyscalls.lua is refactored into core, modules, classes:

lsg/src $ /usr/libexec/flua module-name ../sys/kern/syscalls.master

Modules: syscalls.lua, syscall_h.lua, syscall_mk.lua, init_sysent.lua, systrace_args.lua, sysproto_h.lua

Will generate the specific file associated with the module and no other files
This accomplishes the goal of dynamic file generation - a useful tool for kernel developers

  1. To accomplish the deliverable of a system call creation library that is easily extensible:

An entirely different design of makesyscalls.lua, including:

function bsdio:write(line)
        assert(self.bsdio:write(line))
end
  1. To accomplish the deliverable of being well-documented:

Thorough explanation of procedures and semantics are commented throughout the library. Along with function explanations, any unclarity from makesyscalls.lua is now explained and commented. Usage of the library is documented in the project's github README

Final Outcome:
Overall, the project was very successful. The goal of decoupling makesyscalls.lua was accomplished and the library is much clearer, easier to work with, and easier to make changes to.

Todo

Looking Forward

FreeBSD's great! I've learned more than I could have asked for, especially working with such experienced developers. FreeBSD is (in my opinion) the ultimate UNIX experience. It has everything that's needed and has a great development vision on how it accomplishes that.

Continue to use FreeBSD, master UNIX skills, and contribute where I see fit. Look forward to next summer's GSoC - if I have a good proposal B-)

Discussion

Class Design Diagram
There were certain uncertainties about the class design diagram; mainly whether there should inclusion/exclusion arguments for the modules or not, or if it’s not worth the time. Also, whether class sysproto is needed or if it’s an empty interface to have an interface. class scret already has the feeling of being unnecessary, but being included in the original proposal, it was kept to uphold the deliverables.

Outcome:
Due to the above, both inclusion/exclusion arguments for the modules and class sysproto were not included from the class design diagram.
The primary design decision I made to keep the ergonomic-ness of the library and dynamic file generation was to have freeBSD.lua call all of the modules (preserving the functionality of makesyscalls.lua), but utilize a nifty Lua 5.4 trick to allow the modules to be ran as standalone scripts. Here's the snippet:

-- Check if the script is run directly
if not pcall(debug.getlocal, 4, 1) then
    -- Entry of script
    if #arg < 1 or #arg > 2 then
        error("usage: " .. arg[0] .. " syscall.master")
    end

This trick allows the script to determine if its ran as a module or standalone. The condition then guards the standalone entry point. It’s inspired by Python's if __name__ == "__main__": construction.

Temporary Files
Making a decision on continuing to utilize temporary files (as did makesyscalls.lua) or not, was a major design decision that took careful consideration on what would be the best way forward. makesyscalls.lua relied on writing to a bunch of temporary files and then “stitching” them back together into the proper order. My mentors made strong indication that all traces of makesyscalls.lua should be removed. LSG should be designed like new.

Outcome:
Given that desire, the approach I took to solve the problem that not everything could be generated on the first write pass, is to “cache” different lines at different “storage levels”; so then, while doing primary generation lines for later generation can be stored, even allowing different levels. Finally, the storage cache can be unrolled and written. This solution effectively removes the need of stitching temporary files together, by caching in memory instead. It's a more modern solution, inline with the goals of the refactor.

makesyscalls.lua
A major obstacle to overcome was, as an outsider, and even moreso as a student, the original makesyscalls.lua is difficult to parse. In fact, that problem is the primary goal and motivation of the refactor; it needs to be more approachable, easier to maintain, and easier to extend, to streamline the system call creation process for both new contributors and experienced developers. A continuing development of the project was unraveling and understanding the different moving pieces all deeply nested and bundled up into the monolithic and functional interface of makesyscalls.lua. Further, unifying those features into a more modern, decoupled object-oriented interface. During the project, there were many points where having a "good enough" idea of the specifics was not enough to accomplish the refactor. An intricate and deep understanding of makesyscalls.lua was required to conduct an effective and readable refactor--not an easy ask.

Outcome:
There were multiple changes of pace as new things were found out, old things were deemed incorrect, and a constant refinement of a more clear vision of what the refactor should look like.

Documentation
Similar to the above problem, was a lack of documentation. Comments in the library help kernel developers navigate and extend the library better to fit their needs. However, there were virtually none before.

Outcome:
My refactor involved elucidating a lot of the inner workings of makesyscalls.lua with comments and documentation, and also being sure to set that standard moving forward for the library.

Bitmasks and bitwise operations
makesyscalls.lua used bitmasks and bitwise operations to accomplish its “sorting” of system calls. Although this is fine, memory is cheap now and it's a more popular convention of the past.

Outcome:
Warner and Kyle called for a complete removal of bitmasks because of their lack of clarity and difficulty to work with; of which, I was in agreement with. Bitmasks and bitwise operations are now completely gone and flags are now declaratively named as part of the system call data structure, to align with the readability of syscalls.master.

Lua
A personal difficulty that I had to overcome was less-than-ideal experience with Lua. Although I feel confident in shell and have reasonable experience with Perl, Lua was an uncomfortable language for me. I have had experience with Lua in NeoVim configuration, which is why I proposed this project and was excited to improve my skills. Many Lua semantics and the “bare-bones” approach of Lua were learning points for me. I gained immense value out of it though, feeling more confident in Lua than my confident languages. I’ve also improved my flexibility as a programmer--being able to work in different and new technologies fluidly is both a desirable and important skill to have.

Outcome:
I've learned to love all the things that made me uncomfortable about Lua. I'm overjoyed to have it has a useful skill in my programming toolset. Lua has advanced shell scripting capabilities of more than a basic shell, solves the readability and convolution problem of Perl, and doesn't have the weight of the Python interpreter. It's great!

Final Thoughts
Everything in the project went mostly smoothly. Unfortunately my mentors were very busy and had limited time, leaving a lot of resourcing and decisions for me to make. It’s a valuable skill to be able to execute on deliverables in absence of instruction, so I benefited having that independence and executive decision to fulfill the project. There were setbacks because of important project design decisions that needed consideration or refactoring, and also breaking apart the generally very unapproachable code. Setting aside points of necessary project growth, everything went mostly smoothly and proceeded at the expected pace.

Work Log

Bonding Period

Warner requested a class design diagram. I had to take my finals for school, so time was limited. I produced the class design diagram and received a "This looks great to my eye" from Warner. Class diagram successfully accomplished.

Other things that I accomplished:

Communication and precedents of the project were established.

Week One

Day One

Committing to https://github.com/agge3/lsg

Rest of Week

That was achieved.

Having difficulties, so they're not called anywhere and need work.
Going to keep continuing in the same direction.

Week Two

Couple areas that need attention, and are definitely not a final implementation, but a good starting point.

Questions are:

  1. Is everything looking as expected. Am I on the right track?

  2. My scarg:add() function is inserting a nested table, which doesn't seem ideal. I'm having problems thinking of another way, with how Kyle originally handled 64-bit types.

  3. On my class design, did you like wrapping the function data into scproto.lua, or do you want to stick with scarg and secret?

  4. Are my states for syscall:add() what you meant, by explicit state names?

Week Three

Questions are:

  1. On class scarg, I was wondering: Instead of it being invoked line by line with class syscall, what about it invoking its own subroutine so that it can capture all the args for the entry? Have that be the argument table, and then return control to class syscall when it's done. That way the arguments for the entry are more of a packaged unit, and aren't as coupled with class syscall.

Mentor Input:

Response:

Week Four

Specifically, comment type is an optional parameter, and it can also handle multi-lines with newlines.

Problem:
This output:

util.generated_tag("System call argument to DTrace register array conversion\nThis file is part of the DTrace syscall provider")
/*
 * System call argument to DTrace register array conversion
 * This file is part of the DTrace syscall provider
 *
 * DO NOT EDIT-- this file is automatically @generated
 */

> util.generated_tag("FreeBSD system call object files.", "#")
#
 # FreeBSD system call object files.
 #
 # DO NOT EDIT-- this file is automatically @generated.
 #

It doesn't match the old script correctly, but (I think) it's good enough.

Answer: It's good enough.

Questions are:

  1. I'm wondering if you guys like that idea or not. Also, I haven't decided if it should be a module or a class. I was thinking that having some sort of state could be prove to be useful (i.e., class).

Created a new branch, "separating-modules", that's VERY WIP and just me trying to sort out what goes where.
That's probably the direction I'll keep going. I'd like to wait for feedback before messing more with what I've been working on.

Important Questions are:

  1. What type(s) in syscalls.master are noncompat? Specifically, what is handle_noncompat() referring to?

  2. I would like some suggestions on documentation and style. I found Kyle's blog on flua implying that style is still being figured out. Warner doesn't 100% follow style.lua(9).

    • How should I be documenting?

    • Tell me if I'm doing anything wrong style-wise

Mentor Input:

Also, both Warner and Kyle provided more insight to style that can be found on github, https://github.com/bsdimp/lsg/pull/1

Response:

  1. Correct all of the inconsistencies in Warner's style, align the style to makesyscalls.lua, and also thoroughly comment and document any existing work and any future work.
  2. Meticulously align the style with style.lua() and style(8).

Week Four

A lot of discoveries in working through the modules, and the specific file generation.

Summary:

Communication to Mentors:

Week Five

Mainly just address some remaining xxx and fully move away from the bitmasks

Personal Thoughts:

It all seems to be working mostly correctly. I saw a comment got lost at a certain point, which offset things, which goes back to validating the numbers -- not currently implemented. But anyway, everything seems to be working, config included, reasonably well.

Currently, what I have is what I'd like to call the "early-this-week" consolidation point.

Midterm Evaluation Completed

Week Six

I’ll be keeping the main branch of my fork as it is, whenever there’s time for a review — to serve as a checkpoint.

Doing both because, between the two, that will pretty much get everything in the library where it needs to be.

For testing, I’ve just been catting the output to a tmp file, or using the lua interpreter for small functions.

Mentor Input:

Response:

Week Seven

The off output is just from a wrong case. (NOTE: It was NOT)

Personal Thoughts:

Week Eight

init_sysent took longer than expected.

It ended up being that for the syscall iterator a shallow copy is fine for the range, but in the case of a full syscall a deep copy is necessary with the now nested args table.

It’s still not perfect, but it’s mostly there.

Instead of continuing with temp files, like sysproto.h (which has a lot of temp files), I’m storing lines that can’t be written right away in a table with different indexes, which then unravels itself at the end.
This is more in line with the goals discussed with my mentor.

As the project deadline is getting closer, my plans are have all the modules mostly working, and then access the small details to get them all the way there. So that’s what I’ll be continuing with this week.

Week Nine

Mentor Input:

Response:

  1. Continued on with the previous plans. Everything’s mostly working. Have cleaned up a lot of things too. Not 100% happy on confirming everything’s correctly working. Going to continue to compare to confirm it is, or fix if it isn’t.
  2. I was really hoping that we could have a review, so I could get a gauge of the work I’ve done. Also, to have some instruction of things that you want/things that you don’t want, or if it looks good. To have some clarity to focus on and have a great contribution.
  3. (Very not expected) I had to do some moving last weekend, which left me with less time than I’d like. Also why I’m messaging now and not sooner.

EDIT: I just saw your email, so yes, but I would like to have you guys look at it first. The branch all my recent work is on is https://github.com/agge3/lsg, init_sysent

Final Submission

Mentor Input:

Response:

Adding Syscalls to FreeBSD:

Brooks Davis’ recent commits:


CategoryGsoc

SummerOfCode2024Projects/RefactorSyscallCreationScript (last edited 2024-11-01T00:36:53+0000 by MarkLinimon)