Interested in a software project topic, research project topic, or a thesis topic? Our topic areas include embedded and distributed systems, Internet of Things, virtualization and cloud computing, middleware and operating systems, performance engineering, software verification, program analysis, programming languages and tools, model-driven development, and more.
Naturally, we will tailor the assignments to your skills and interests, just contact us to get started. Also, do not hesitate to contact us if you have your own idea for a topic that is related to our topic areas.
Or, just take a look at the topic suggestions below, we can start from there:
- Theory Plugins into MCSat-based SMT solver Yaga (Martin Blicha, Jan Kofroň)
- Regression Benchmarking (Petr Tůma, Lubomír Bulej, Vojtěch Horký)
- Modern JVM Benchmark Suite (Petr Tůma, Lubomír Bulej, Vojtěch Horký)
- Integrating data in programming systems (Tomáš Petříček)
- History and philosophy of programming (Tomáš Petříček)
- Design of new interactive programming systems (Tomáš Petříček)
Theory Plugins into MCSat-based SMT solver Yaga
- Target
- Bc. thesis, Master thesis or Research project
- Contact
SMT solvers are the underlying reasoning engines for a wide range of automated reasoning techniques in the verification of software, protocols, distributed systems, etc. Most of the state-of-the-art SMT solvers are based on CDCL(T) framework. MCSat [1,2] represents a generalized approach which has, however, not been thoroughly explored yet. MCSat is particularly attractive for the applications in interpolation-based model checking [3], a popular technique for automated software verification. The goal of this work is to extend the current version of Yaga SMT solver [5] being developed in the Formal Methods Group at D3S with new modules and functionality. This involves support for theories beyond Linear Real Arithmetic, in particular Linear Integer Arithmetic, Non-Linear Arithmetic, and Uninterpreted Functions. Additional desired extensions include support for incrementality and Craig interpolation. The tool is to be evaluated on the benchmarks from SMT-LIB [4] and compared to the existing state-of-the-art solvers.
Depending on the type of the thesis, a particular extension or a subset of the extensions above will be selected.
- de Moura, L., Jovanović, D. (2013). A Model-Constructing Satisfiability Calculus. In: Giacobazzi, R., Berdine, J., Mastroeni, I. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2013. Lecture Notes in Computer Science, vol 7737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35873-9_1
- D. Jovanovic, C. Barrett and L. de Moura, “The design and implementation of the model constructing satisfiability calculus,” 2013 Formal Methods in Computer-Aided Design, 2013, pp. 173-180, doi: 10.1109/FMCAD.2013.7027033.
- Jovanović, D., Dutertre, B. (2021). Interpolation and Model Checking for Nonlinear Arithmetic. In: Silva, A., Leino, K.R.M. (eds) Computer Aided Verification. CAV 2021. Lecture Notes in Computer Science(), vol 12760. Springer, Cham. https://doi.org/10.1007/978-3-030-81688-9_13
- SMT-LIB
- Yaga MCSat-based SMT Solver
Regression Benchmarking
- Contact
Regression benchmarking refers to the practice of evaluating performance of individual software releases with performance tests, much as it is done with functional tests in functional regression testing. The following topics are related to our ongoing research activities in regression benchmarking:
Evaluating regression detection accuracy
In the past years, we have collected a large database of measurements, partially annotated with information about performance regressions. This task is about coming up with data analysis algorithms, including possible applications of deep learning approaches, to identify performance regressions in the measurements, and evaluating the detection accuracy and efficiency.
Medium size project for people interested in data analysis and real life software performance, suitable as individual research project, bachelor or master thesis. Some knowledge of statistics and general experimental methodology is useful.
Alternatively, the task can be about implementing innovative approaches to detection research, for example in the form of an API that would permit crowdsourcing the data analysis.
Small to medium size project for people interested in programming, suitable as individual research project, bachelor or master thesis. Knowledge of Python and Django, plus API technologies in general, is useful.
Seeking regression root causes
Discovering the root cause (reason) of a particular performance regression is usually a manual task. This task is about coming up with methods and tools for easing this task. Of particular interest is the ability to identify incidental performance changes that do not require developer attention.
Medium to large size project for people who enjoy researching difficult open problems, suitable as individual research project or master thesis. Knowledge of compilers and other programming tools, and computer architectures, is useful.
Assessing performance test coverage
Although the concept of test coverage is well defined for functional tests, an analogous concept for performance tests is still missing. This task is about coming up with ways to characterize performance test coverage.
A medium to large size project for people who enjoy open research topics, suitable as individual research project or master thesis. Knowledge of compilers and general working of computers is useful.
Modern JVM Benchmark Suite
- Contact
The development of modern compilers and programming language runtimes relies on empirical measurements to assess application performance. These measurements require benchmark suites that are reasonably compact but also represent practical applications. One such suite is Renaissance.
Renaissance is in constant development and provides many contribution opportunities:
Extending the suite with benchmarks in other languages
Java Virtual Machine can execute a spectrum of languages supported by modern dynamic compilers (C, R, JS, Ruby and so on). Renaissance currently only supports Java and Scala, this task should extend the build system and include more workloads.
Medium size project for people who like coding in diverse environments, knowledge of Java, Scala build system, and build systems in general is useful.
Extending set of existing workloads with AI-based workload
The most important aspect of Renaissance suite is that it provides a diverse set of workloads. In this task, we would like to extend the suite with a new workload from the world of artificial intelligence. The workload must be a reasonably isolated representative of a typical AI task.
Small to medium size project for people interested in both AI and performance engineering. Knowledge of Java, Scala, and AI frameworks is useful.
Augmenting the existing workloads with validation
In the context of compiler development, execution correctness is not necessarily guaranteed. To detect incorrect compiler or runtime behavior, some workloads in Renaissance validate their own results. This task should extend validation to all existing Renaissance workloads.
Small to medium size project for people who like coding in diverse environments, knowledge of Java, Scala, and frameworks such as Spark is useful.
Analyzing the workload coverage
It is essential that the benchmark workloads are diverse enough to approximate many applications, but also compact enough to execute in relatively short time. This task should come up with ways to analyze the coverage of the compiler or runtime by the benchmark.
A medium to large size project for people who enjoy open research topics, knowledge of compilers and general working of computers is useful.
Integrating data in programming systems
- Target
- Bc. thesis, Master thesis or Research project
- Contact
Code written by data scientists to explore data (in Jupyter Notebooks and similar systems) is interesting for a number of reasons. First, it often works with concrete data sources and so you can more closely integrate data into programs. Second, it often interleaves execution and programming (you write a bit of code, run it and then continue editing). The following ideas for projects around programming with data use those two ideas in one way or another.
-
Analysing data science code - First of all, it would be interesting to know how exactly data analytical code looks like. The idea of this project is to scrape code from GitHub and capture the differences between code written to analyze data (e.g. Python in Jupyter Notebooks) and normal source code (e.g. Python libraries and applications). In what ways are the two different?
-
Data integration via Type Providers - One interesting way of integrating data into programming languages is using type providers (see F# Data library and TheGamma project). Type providers generate types, based on some schema or logic, that a programmer can then use to access data. Related project may include implementing type providers for interesting data sources (e.g. Semantic Web and Linked Data) and also add support for type providers to other languages (e.g. TypeScript).
-
AI-based tools for data exploration - Tools like Copilot help programmers write code - but how can they most effectively be integrated with tools for data exploration? One possible idea is to see how to use the underlying machine learning models to semi-automatically construct code for accessing data (possibly using the above type providers) that the user is interested in, based on natural language queries.
-
Data visualizations to encourage critical thinking - How can we visualize data so that the result makes viewers think more critically about what they see? A nice example of this is the You Draw It visualization by New York Times. How can we built other visualizations like this? And could we also encourage readers to critically think about model behind the data (e.g. for Agent-based economic models)?
References
- FSharp.Data: Data Access Made Simple - type provider library
- The Gamma: Democratizing data science - online playground
- The Gamma Programmatic Data Exploration for Non-programmers (PDF)
- You Draw It: What Got Better or Worse During Obama’s Presidency - nice data vis!
- Fluid: data-linked visualisations - explorable data visualizations
History and philosophy of programming
- Target
- Bc. thesis or Master thesis
- Contact
Understanding the full complexity of programming requires an inter-disciplinary approach that is aware of the history and origins of computing concepts, understands their socio-technical nature and the historical developments of the field. If you are interested in looking at programming from a broader historical and philosophical perspective, there are many potential topics to work on:
-
Recovering ideas from past programming systems - There are many past programming systems that have been forgotten (Commodore 64 BASIC, Hypercard, Boxer, LISP machines), but have interesting ideas in them that could be adapted and used in programming. A project could document such systems using historical sources and reimplement some of the ideas.
-
Document how programming concepts evolve - Concepts like types, processes, objects or events have existed in many different programming environments over time (but also in logic or linguistics). The idea is to document the meanings of such concepts in the past, see how the meaning changes and how new technologies, formalisms or languages shape the meaning of our concepts (see also Lakatos (1976)).
The above are just two example ideas - if you find them interesting, please get in touch! A project like this would put more emphasis on rigorous working with historical sources and quality of writing and may be more suitable for students with some relevant background or existing interest in the topic. In other words, it is less technical but equally challenging as more technical work.
References
- The Lost Ways of Programming: Commodore 64 BASIC - online reconstruction
- What we talk about when we talk about monads (PDF)
- For more ideas, see HaPoP 5 abstracts and Computing and Programming in Context
- Imre Lakatos (1976). Proofs and Refutations: The Logic of Mathematical Discovery
Design of new interactive programming systems
- Target
- Bc. thesis, Master thesis or Research project
- Contact
Programs are created not just by writing code, but by interacting with rich stateful programming systems or developer environments. Programming systems still include code, but they also encompass live interactive previews, structure editors, runtime environment and other developer tools. Thinking about programming systems, rather than just languages, can give us new ideas about making programming easier, trustworthy and accessible.
The ideas below are open-ended. They can be used as a starting point for designing a novel programming system. For a project or thesis, we could focus on one or two of those, build a prototype for a suitable application domain and evaluate the result in an appropriate way, e.g. by formailizing the design or developing substantial case study.
If you are interested, please get in touch to discuss the details!
-
Simplifying programming by working with concrete values - When we program, we typically work with abstract variables. But what if we were writing code only once we have concrete values for all our variables? That way, the programming environment could help us by looking at the concrete (sample) inputs. Something like this already happens in a number of places. A debugger may let you inspect values and edit code, a Jupyter Notebook lets you load concrete data before writing more code, and the Darklang project uses this idea when writing REST handlers (see Office Sign-in Application demo). The idea is to design a programming system that lets you easily construct programs by starting form concrete values.
-
Understanding programs by treating them as sequences of interactions - When programming, we invoke refactoring tools, copy code snippets, write and run tests or paste code into a REPL to test it, but the programming environment does not know about this! The idea of this project is to represent programs not as source code but as sequences of interactions with environment. This allows many interesting things. The programming environment can give you better hints (based on the result of running a code snippet or a test), merge conflicts get easier (renaming a variable in many places is just a single edit) and you can display a single program in multiple different views. The idea would be to develop a prototype programming system and explore capabilities of this new representation. See also the Histogram project for a few more ideas.
-
Bringing together program evaluation and editing - As a variation on the above, it would also be possible to use the “sequence of interactions” idea not just to represent a program, but also its execution. That is, when a program runs, the execution would generate a sequence of edits that enact the evaluation step-by-step. This would make debugging easier (you can manually replay programs) and allow you to define logic by something like “macro recording”. The idea may be applicable, for example, to programming education (you can manually simulate execution to understand it better).
-
Explainability via meta-programming over execution traces - In LISP, code is data and so you can write code that manipulates code (meta-programming). But what if the execution trace of a program was also data, accessible to the program? This way, you could write programs that analyze (or even modify) their own execution. This can be used for explainable data visualizations (see Fluid: data-linked visualisations), to track provenance (explain what inputs contributed to what outputs) or analyze test failures (what part of execution invalidated the test assumption).
-
Tree-based Turing machine models of programming systems - Languages like Subtext represent programs as documents - tree structures that include both data and code (and possibly also the execution trace as in the project above). Execution of such programs is done by rewriting the document (tree). The idea of this project is to define minimal formal model of such execution - a “Turing machine” that operates on a tree (can shift to parent/child nodes and rewrite values in nodes) and use the formalism to define the meaning of a small Subtext-like programming language.
References
- Darklang and more videos about it
- Histogram: You have to know the past to understand the present - online demo
- Subtext: uncovering the simplicity of programming - project homepage
- Fluid: data-linked visualisations - online demo
- Linked Visualisations via Galois Dependencies (PDF)