Refactor Syscall Creation Script
Student: TylerBaxter (agge@FreeBSD.org)
Mentor: WarnerLosh (imp@FreeBSD.org), KyleEvans (kevans@FreeBSD.org)
Project description
The current FreeBSD system call creation script, sys/tools/makesyscalls.lua, was implemented by Kyle Evans and iterated on by Brooks Davis. Its purpose is to streamline the introduction of system calls into the FreeBSD kernel. makesyscalls.lua is a transliteration of FreeBSD’s original makesyscalls.awk script from awk to Lua. It’s a monolithic script that has kept much of the procedural-oriented awk code. It does not take full advantage of modern features of Lua (object-oriented design) and has been difficult to add additional features to.
FreeBSD system call creation will be further streamlined by a complete refactor of makesyscalls.lua, to have easily extensible objects and dynamically-called modules. The outcome is to construct a new library out of makesyscalls.lua: Lua Syscall Generator. Driving design goals are it will be (1) easier to extend, (2) more approachable, and (3) more flexible to fit different needs. Remnants of the original awk way of doing things will be cleaned up into modern design paradigms. Lua modules and "classes" will provide better namespacing of globals, dynamic generation of output files, and decoupling of the original code. After refactoring, the previous functionality of makesyscalls.lua can be achieved simply by calling the necessary modules. By taking advantage of modern Lua features and object-oriented design, makesyscalls.lua will be a new library of useful tools for kernel developers to build upon.
The benefit to FreeBSD is to further streamline the creation of system calls (adding even more guardrails than makesyscalls.lua already had), a more maintainable interface, and a strong foundation to build upon in the development of future system call creation tools. There is a clear intent to expand on the work of Kyle Evans and Brooks Davis, and my refactor will address that intent and provide an extensible interface to do so. It will further demystify the process of system call creation and allow others to more easily contribute.
Warner Losh has done preliminary work on refactoring makesyscalls.lua, which will serve as the basis for my refactor. It is unfinished and does not incorporate Brooks Davis' recent commits. A successful project will be finishing the pre-established work, incorporating recent commits, and the previously stated design outcomes. There is still much work to be done and critical design choices to be made (e.g., is it better to have local write procedures or a class interface?). Uncoupling the procedural code in a well-thought-out and extensible way is the major obstacle and motivation for the project.
Approach to solving the problem
Currently:
process_args():
Replaces argument types with respective ABI config types, for padding expansion macros
Adds processed arguments (type and name) to a table
Returns table and changes_abi flag
bool changes_abi is a flag for ABI changes from native
bool check_abi_changes() checks for ABI changes from native
strip_arg_annotations() removes Microsoft SAL and leaves just type
process_syscall_def = λ:
Calls handle_compat(), handle_noncompat(), etc.
Defines and processes system call return type
Flags, function name, function arguments, etc.
loop write_line() procedure of producing auto-generated files:
- Output is written to temporary files, then the files are "stitched" back together in the correct order
handle_compat(), handle_noncompat(), handle_obsol(), handle_reserved() do the bulk of the work
Available data must be sysnum, thr_flag, flags, sysflags, rettype, auditev, funcname, funcalias, funcargs, argalias, syscallret
How to solve:
Much of the heavy lifting will be done by refactoring the procedural code into new OOP objects and methods. By uncoupling the procedural code into OOP interfaces, the library will be less easy to break, more accessible to approach, and much easier to extend
There needs to be separate module subroutines of the previous loop write_line() procedure of producing auto-generated files. Common methods will be put into public interfaces, properly encapsulated, and made reusable
Auto-generation of init_sysent.c, systrace_args.c, sysproto.h has shared data; shared data will be declarative and easily accessible
More solutions will be decided upon in the refactor, as it's important to decide on a final interface before setting final solutions
Deliverables
System call creation will work as before
makesyscalls.lua is refactored into core, modules, and classes
System call creation library is easily extensible (It should provide a basis for future system call creation scripts)
Well-documented (e.g., "bsd_foo will be generated", how to opt-out of complex generation, etc.)
Milestones
Milestone 1: Implement class scarg
Mid-term Evaluation: Implement class scarg
- Complete Milestone 1
Milestone 2: Finish implementation of init_sysent.lua module
Milestone 3: Dynamically generate output files (as a side-effect of a more extensible interface)
Final Submission: makesyscalls.lua refactored into a library of modules and classes
- Complete Milestone 2
- Complete Milestone 3
- May 27th: Start of coding
June 24th-28th: Mid-term Evaluation
- Complete Milestone 1
- July 1st-5th
- Complete Milestone 2
- July 15th-19th
- Complete Milestone 3
July 29th: Final Submission - End of coding (soft)
August 5th: Mentor Final Evaluation - End of coding (hard)
Test Plan
Lua is an interpreted scripting language, meaning two great things for testing: rapid iteration and a REPL. There are a lot of moving pieces to makesyscall.lua. pcall() and assert() may be used liberally. print() all write procedures before finally changing to io.write(), and cat to a temporary file for visual comparison. Methods will be built up piecemeal, unit testing correct behavior and outputs of the methods. makesyscalls.lua currently functions correctly, so certain assumptions of Lua snippets will be made (e.g., isptrtype() will be assumed to work when refactored), though unit testing will still be employed to build up methods. Potentially, third-party and libraries may also be employed.
Being a refactor, regression testing will be the primary testing procedure.
Due to refactoring limiting testing, an extended period of live regression testing will occur after my refactor has been completed. I will patch as necessary, until system call creation works as before.
The Code
Code: |
|
Proposal: |
|
Class Design Diagram: |
Outcomes
Project Refactor Syscall Creation Script, now formally called Lua Syscall Generator (LSG), is a refactor of FreeBSD's makesyscalls.lua as a library.
- To meet the deliverable that system call creation will work as before:
lsg/src $ /usr/libexec/flua freebsd.lua ../sys/kern/syscalls.master
Optionally specified configuration file or a different target syscalls.master may be provided
Will generate and perform as expected to makesyscalls.lua
To meet the deliverable that makesyscalls.lua is refactored into core, modules, classes:
lsg/src $ /usr/libexec/flua module-name ../sys/kern/syscalls.master
Optionally specified configuration file or a different target syscalls.master may be provided
Modules: syscalls.lua, syscall_h.lua, syscall_mk.lua, init_sysent.lua, systrace_args.lua, sysproto_h.lua
Will generate the specific file associated with the module and no other files
This accomplishes the goal of dynamic file generation - a useful tool for kernel developers
- To accomplish the deliverable of a system call creation library that is easily extensible:
An entirely different design of makesyscalls.lua, including:
No bitmasks, bit flags, or bitwise operations are done. Types are declarative and match the readability of syscalls.master (e.g., syscall.type.STD)
Processing of arguments and return type is decoupled into classes scarg and scret, respectively.
Common procedures are made globally accessible and reusable in util.lua and config.lua
class bsdio is an IO class that simplifies the calls of best-practice Lua IO calls:
function bsdio:write(line) assert(self.bsdio:write(line)) end
bsdio carries internal state that can be changed dynamically, and provides an interface for common IO macros.
Instead of writing to temporary files and "stitching" them together, bsdio allows caching of different stages of generation in the form of "storage levels". Lines can stored in their respective storage level, all in one write pass, and unrolled accordingly.
Each module has a much different and more readable procedure of generating files, decoupled from the rest. This allows new contributors to approach the library easier and experienced users to extend it easier.
- To accomplish the deliverable of being well-documented:
Thorough explanation of procedures and semantics are commented throughout the library. Along with function explanations, any unclarity from makesyscalls.lua is now explained and commented. Usage of the library is documented in the project's github README
Final Outcome:
Overall, the project was very successful. The goal of decoupling makesyscalls.lua was accomplished and the library is much clearer, easier to work with, and easier to make changes to.
Todo
- Different ABI targets have not be thoroughly tested
Due to the sheer output of lines of LSG's file generation it’s possible native (amd64) is not 100% correct either. Possibly a tool can be engineered to confirm the output; however, that was not within scope of the project
Being the goal is deliver identical file generation to makesyscalls.lua, after final review these issues will be addressed until successful upstream integration can be accomplished
Looking Forward
FreeBSD's great! I've learned more than I could have asked for, especially working with such experienced developers. FreeBSD is (in my opinion) the ultimate UNIX experience. It has everything that's needed and has a great development vision on how it accomplishes that.
Continue to use FreeBSD, master UNIX skills, and contribute where I see fit. Look forward to next summer's GSoC - if I have a good proposal
Discussion
Class Design Diagram
There were certain uncertainties about the class design diagram; mainly whether there should inclusion/exclusion arguments for the modules or not, or if it’s not worth the time. Also, whether class sysproto is needed or if it’s an empty interface to have an interface. class scret already has the feeling of being unnecessary, but being included in the original proposal, it was kept to uphold the deliverables.
Outcome:
Due to the above, both inclusion/exclusion arguments for the modules and class sysproto were not included from the class design diagram.
The primary design decision I made to keep the ergonomic-ness of the library and dynamic file generation was to have freeBSD.lua call all of the modules (preserving the functionality of makesyscalls.lua), but utilize a nifty Lua 5.4 trick to allow the modules to be ran as standalone scripts. Here's the snippet:
-- Check if the script is run directly if not pcall(debug.getlocal, 4, 1) then -- Entry of script if #arg < 1 or #arg > 2 then error("usage: " .. arg[0] .. " syscall.master") end
This trick allows the script to determine if its ran as a module or standalone. The condition then guards the standalone entry point. It’s inspired by Python's if __name__ == "__main__": construction.
Temporary Files
Making a decision on continuing to utilize temporary files (as did makesyscalls.lua) or not, was a major design decision that took careful consideration on what would be the best way forward. makesyscalls.lua relied on writing to a bunch of temporary files and then “stitching” them back together into the proper order. My mentors made strong indication that all traces of makesyscalls.lua should be removed. LSG should be designed like new.
Outcome:
Given that desire, the approach I took to solve the problem that not everything could be generated on the first write pass, is to “cache” different lines at different “storage levels”; so then, while doing primary generation lines for later generation can be stored, even allowing different levels. Finally, the storage cache can be unrolled and written. This solution effectively removes the need of stitching temporary files together, by caching in memory instead. It's a more modern solution, inline with the goals of the refactor.
makesyscalls.lua
A major obstacle to overcome was, as an outsider, and even moreso as a student, the original makesyscalls.lua is difficult to parse. In fact, that problem is the primary goal and motivation of the refactor; it needs to be more approachable, easier to maintain, and easier to extend, to streamline the system call creation process for both new contributors and experienced developers. A continuing development of the project was unraveling and understanding the different moving pieces all deeply nested and bundled up into the monolithic and functional interface of makesyscalls.lua. Further, unifying those features into a more modern, decoupled object-oriented interface. During the project, there were many points where having a "good enough" idea of the specifics was not enough to accomplish the refactor. An intricate and deep understanding of makesyscalls.lua was required to conduct an effective and readable refactor--not an easy ask.
Outcome:
There were multiple changes of pace as new things were found out, old things were deemed incorrect, and a constant refinement of a more clear vision of what the refactor should look like.
Documentation
Similar to the above problem, was a lack of documentation. Comments in the library help kernel developers navigate and extend the library better to fit their needs. However, there were virtually none before.
Outcome:
My refactor involved elucidating a lot of the inner workings of makesyscalls.lua with comments and documentation, and also being sure to set that standard moving forward for the library.
Bitmasks and bitwise operations
makesyscalls.lua used bitmasks and bitwise operations to accomplish its “sorting” of system calls. Although this is fine, memory is cheap now and it's a more popular convention of the past.
Outcome:
Warner and Kyle called for a complete removal of bitmasks because of their lack of clarity and difficulty to work with; of which, I was in agreement with. Bitmasks and bitwise operations are now completely gone and flags are now declaratively named as part of the system call data structure, to align with the readability of syscalls.master.
Lua
A personal difficulty that I had to overcome was less-than-ideal experience with Lua. Although I feel confident in shell and have reasonable experience with Perl, Lua was an uncomfortable language for me. I have had experience with Lua in NeoVim configuration, which is why I proposed this project and was excited to improve my skills. Many Lua semantics and the “bare-bones” approach of Lua were learning points for me. I gained immense value out of it though, feeling more confident in Lua than my confident languages. I’ve also improved my flexibility as a programmer--being able to work in different and new technologies fluidly is both a desirable and important skill to have.
Outcome:
I've learned to love all the things that made me uncomfortable about Lua. I'm overjoyed to have it has a useful skill in my programming toolset. Lua has advanced shell scripting capabilities of more than a basic shell, solves the readability and convolution problem of Perl, and doesn't have the weight of the Python interpreter. It's great!
Final Thoughts
Everything in the project went mostly smoothly. Unfortunately my mentors were very busy and had limited time, leaving a lot of resourcing and decisions for me to make. It’s a valuable skill to be able to execute on deliverables in absence of instruction, so I benefited having that independence and executive decision to fulfill the project. There were setbacks because of important project design decisions that needed consideration or refactoring, and also breaking apart the generally very unapproachable code. Setting aside points of necessary project growth, everything went mostly smoothly and proceeded at the expected pace.
Work Log
Bonding Period
Warner requested a class design diagram. I had to take my finals for school, so time was limited. I produced the class design diagram and received a "This looks great to my eye" from Warner. Class diagram successfully accomplished.
Other things that I accomplished:
I've forked Warner's WIP repository.
I've added a design.md. It's available here
I'm on FreeBSD-CURRENT and building from source, so my environment is completely set up.
I've been doing the other administration stuff, just waiting on email@FreeBSD.org
Communication and precedents of the project were established.
Week One
Day One
- Brought the necessary things from process_args() over.
- Split off syscall:addargs() from the previous syscall:add() method.
- Have a good amount of process_args() close to working in class scarg.
- A couple public methods of scarg:
init()
process()
add()
- And local brought over from original:
check_abi_changes()
strip_abi_prefix()
is_xxx_type()
strip_arg_annotations()
- Left comments all over the place on anything I'm unsure of/haven't decided on yet.
Committing to https://github.com/agge3/lsg
Rest of Week
- Continued to work towards testable/"is working as before" with all the things from process_args() brought over, and new class scarg.
That was achieved.
- Created a testing branch on my repo, to mess around with.
Added a local DataDumper module, so I can dump tables when I'm debugging.
- Spun off all of the states from syscall:add()
Having difficulties, so they're not called anywhere and need work.
Going to keep continuing in the same direction.
Week Two
- Was able to debug a lot of the parsing errors, as part of introducing class scarg.
- Have explicit named states of syscall:add(), and those are now called and working correctly.
- Grabbed makesyscalls.lua from FreeBSD-CURRENT, so anything I'm pulling over is CURRENT
Couple areas that need attention, and are definitely not a final implementation, but a good starting point.
Questions are:
Is everything looking as expected. Am I on the right track?
My scarg:add() function is inserting a nested table, which doesn't seem ideal. I'm having problems thinking of another way, with how Kyle originally handled 64-bit types.
On my class design, did you like wrapping the function data into scproto.lua, or do you want to stick with scarg and secret?
Are my states for syscall:add() what you meant, by explicit state names?
Week Three
- Added class scret, and using that in class syscall instead.
- Added the files for all the modules, attempting to find a better solution to config.
- Continued down that path, tracing through the original script, trying to move towards everything where it needs to be and data available at the right times.
Questions are:
On class scarg, I was wondering: Instead of it being invoked line by line with class syscall, what about it invoking its own subroutine so that it can capture all the args for the entry? Have that be the argument table, and then return control to class syscall when it's done. That way the arguments for the entry are more of a packaged unit, and aren't as coupled with class syscall.
Mentor Input:
(Warner) I took a quick look and noticed a quick thing: freebsd.lua needs to include the @generated tag in the output using the generated_tag somehow. And that likely needs to be a util routine that's just called to do it.
Response:
I added that in util. I'll add optional arguments to cope with syscalls.mk and systrace_args.c
util.generated_tag(str), with the string being the unique preamble for the file (e.g., "System call prototypes.").
Week Four
- Cleaned up a lot of the rougher edges on what I had so far.
- config is getting increasingly more sorted out, things are in more final places, less todo's and more functions I'm happy with.
- util.generated_tag() deals with all the edge cases.
Specifically, comment type is an optional parameter, and it can also handle multi-lines with newlines.
Problem:
This output:
util.generated_tag("System call argument to DTrace register array conversion\nThis file is part of the DTrace syscall provider") /* * System call argument to DTrace register array conversion * This file is part of the DTrace syscall provider * * DO NOT EDIT-- this file is automatically @generated */ > util.generated_tag("FreeBSD system call object files.", "#") # # FreeBSD system call object files. # # DO NOT EDIT-- this file is automatically @generated. #
It doesn't match the old script correctly, but (I think) it's good enough.
Answer: It's good enough.
- We're not doing the $FreeBSD$ anymore. Remove.
- Added class bsdio, to package some of the common write/read procedures, and also be the wrapper for a simpler Lua assert(write()) call.
Questions are:
I'm wondering if you guys like that idea or not. Also, I haven't decided if it should be a module or a class. I was thinking that having some sort of state could be prove to be useful (i.e., class).
Created a new branch, "separating-modules", that's VERY WIP and just me trying to sort out what goes where.
That's probably the direction I'll keep going. I'd like to wait for feedback before messing more with what I've been working on.
Important Questions are:
What type(s) in syscalls.master are noncompat? Specifically, what is handle_noncompat() referring to?
I would like some suggestions on documentation and style. I found Kyle's blog on flua implying that style is still being figured out. Warner doesn't 100% follow style.lua(9).
How should I be documenting?
Tell me if I'm doing anything wrong style-wise
Mentor Input:
(Kyle) Good enough for now, at least- we can call that later refinement as we look at any diff between old/new generated files.
(Kyle) (To $FreeBSD$) Correct, any of that doesn't strictly need to be maintained.
(Kyle) (To bsdio) Sure, that's reasonable- a bit cleaner than seeing assert(foo:write()) everywhere, and one could conceivably push some of the file multiplexing into it as well.
(Kyle) Noncompat are more for active or largely non-special syscalls, those marked STD/NODEF/NOARGS/NOPROTO/NOSTD (see ncompatflags in the original script).
(Kyle) (To style) Yeah, so, just shoot for something reasonably close to one of the two (whichever feels more natural) and we can hash out what should/shouldn't be acceptable in style in review. re: documenting, preferably with a much higher density of comments than the original script had. A running blurb off to the side with a summary of what's changed would also be good to maintain.
Also, both Warner and Kyle provided more insight to style that can be found on github, https://github.com/bsdimp/lsg/pull/1
Response:
- Correct all of the inconsistencies in Warner's style, align the style to makesyscalls.lua, and also thoroughly comment and document any existing work and any future work.
- Meticulously align the style with style.lua() and style(8).
Week Four
A lot of discoveries in working through the modules, and the specific file generation.
- Cleaned up a lot of that work, so there's a more presentable representation of where I'm at. Merged it into the main branch of my fork.
Explained here: https://github.com/bsdimp/lsg/pull/1
Summary:
- Working through init_sysent, and also partially working through everything else (I want to be able to see the greater picture, but init_sysent is my focus). And then just commenting, documenting, cleaning-up style, trying to uncouple things as much as possible and have a clean interface.
Communication to Mentors:
I’m glad it’s looking good. Your comments were very helpful, so I appreciate the I input. I’m not too worried if they missed the point or didn’t, good info to have. Some of that I’ve picked up in this later work, and am moving towards. This project has been a lot of pick and pull and find a better answer later, as refactoring tends to be.
Week Five
- Start at my main branch and concentrate that into something that would be a good base/consolidation point.
Mainly just address some remaining xxx and fully move away from the bitmasks
- Merged the work I've done towards consolidation into my main branch, to accomplish a good base/consolidation point.
- Took out the bitmasks entirely, decided on an answer to config, tested syscall_h against native, 32-bit, and amd64 linux.
- Comments.
Personal Thoughts:
Sometimes I misinterpret the code, this script is a big learning experience at points and parts are not always clear to me, so I appreciate my mentor's feedback a lot.
It all seems to be working mostly correctly. I saw a comment got lost at a certain point, which offset things, which goes back to validating the numbers -- not currently implemented. But anyway, everything seems to be working, config included, reasonably well.
Currently, what I have is what I'd like to call the "early-this-week" consolidation point.
Midterm Evaluation Completed
Week Six
I’ll be keeping the main branch of my fork as it is, whenever there’s time for a review — to serve as a checkpoint.
- Reconstructing init_sysent and sysproto.
Doing both because, between the two, that will pretty much get everything in the library where it needs to be.
- Everything’s going mostly smoothly, but I’m not at a proper testing point. Trying to work through handling compat.
For testing, I’ve just been catting the output to a tmp file, or using the lua interpreter for small functions.
Mentor Input:
(Warner) I'd like to get all the stand-alone uses of this going so we can test the viability of the library. The final test will be to drop it into the build system and have it just work I can help with build system integration.
Response:
Okay, great. I'm on the same page. Oooh, exciting!!!
Week Seven
- init_sysent is going well.
- Compat's working as expected, which was the challenging part. I'm having slightly off output, but it's looking pretty good.
The off output is just from a wrong case. (NOTE: It was NOT)
Personal Thoughts:
I'm expecting to finish init_sysent early in the week and move to sysproto. I need to fix that case and then test against the different syscalls.master.
Week Eight
init_sysent took longer than expected.
- I was able to sort out the issue with the wrong case.
It ended up being that for the syscall iterator a shallow copy is fine for the range, but in the case of a full syscall a deep copy is necessary with the now nested args table.
It’s still not perfect, but it’s mostly there.
- Moved on to sysproto.h and that’s now also mostly there. Same with systrace_args.
- Put the code in place for syscalls.c, to have it there, but haven’t really done anything on it working.
Instead of continuing with temp files, like sysproto.h (which has a lot of temp files), I’m storing lines that can’t be written right away in a table with different indexes, which then unravels itself at the end.
This is more in line with the goals discussed with my mentor.
As the project deadline is getting closer, my plans are have all the modules mostly working, and then access the small details to get them all the way there. So that’s what I’ll be continuing with this week.
Week Nine
Mentor Input:
(Warner) Do you have code we can start to integrate into the build system in a branch yet?
Response:
- Continued on with the previous plans. Everything’s mostly working. Have cleaned up a lot of things too. Not 100% happy on confirming everything’s correctly working. Going to continue to compare to confirm it is, or fix if it isn’t.
- I was really hoping that we could have a review, so I could get a gauge of the work I’ve done. Also, to have some instruction of things that you want/things that you don’t want, or if it looks good. To have some clarity to focus on and have a great contribution.
- (Very not expected) I had to do some moving last weekend, which left me with less time than I’d like. Also why I’m messaging now and not sooner.
EDIT: I just saw your email, so yes, but I would like to have you guys look at it first. The branch all my recent work is on is https://github.com/agge3/lsg, init_sysent
Final Submission
Mentor Input:
(Warner) Yes. I'd like that. Phabricator or Github Pull request are both options.
(Warner) Yes. I can meet any day this next week. This past week has been too crazy :(
Response:
- Let's do exactly that and get it done! Finishing up and preparing for final project submission.
Useful links
Adding Syscalls to FreeBSD:
https://wiki.freebsd.org/AddingSyscalls, so-you-want-to-add-a-system-call.pdf (AsiaBSDCon 2023), So you want to add a system call (Brooks Davis, video)
Brooks Davis’ recent commits:
libsys: don't try to expose yield
lib{c,sys}: expose _getlogin consistently
makesyscalls: generate private syscall symbols
makesyscalls: add COMPAT14 support
makesyscalls: don't make syscall.mk by default