(-: Geekerie, logiciel libre et vilain gauchisme

Playing with libclang

clang is a great compiler that I use daily in my job and in my personal projects. It works just great and the error messages are really much more clear than with the other compilers.

clang is also usable as a C++ parser library. This is of particular interest for two of my projects: vera++ and ITK wrappers.

vera++ is a C++ style checker that let the developers create their own checking rules in Tcl, Lua or python. The rules have access to the tokens in the C/C++ files parsed, but nothing more. There is no easy way to know that a token is part of a function parameter, a type declaration, etc. clang should be able to provide all these extra information that would be very useful to check the coding style.

ITK is a large C++ image processing library. The code makes a heavy use of C++ templates. A combination of gccxml, swig and an ITK specific code generator is used to give access to the library from python and java, plus doxygen and another code generator to integrate the documentation in the generated code. Unfortunately, gccxml is not very well maintained and is always painful to fix when a new compiler comes out. And it is indeed currently broken on my Mac.

clang should be able to provide the same kind of information. Even better, it should be possible to remove the XML intermediate step, by using directly the clang library in the code generator. clang should also be able to give some informations that gccxml can’t: the typedefs and the data fields of a class.

AST Dump

There are several ways to look at the AST generated by clang for a C++ file. The most simple is with the flags -Xclang -ast-dump -fsyntax-only.

I’ve used this simple test.cpp:

template<typename TData>
class Foo
{
public:
  typedef Foo Self;
  typedef TData Data;
  Foo();
  // just a comment
  Data getData(int i, char const* s);
private:
  /// my precious data
  Data m_data;
};

typedef Foo<int> IntFoo;

Here is the result:

TranslationUnitDecl 0x1028242d0 <<invalid sloc>>
|-TypedefDecl 0x102824810 <<invalid sloc>> __int128_t '__int128'
|-TypedefDecl 0x102824870 <<invalid sloc>> __uint128_t 'unsigned __int128'
|-TypedefDecl 0x102824c30 <<invalid sloc>> __builtin_va_list '__va_list_tag [1]'
|-ClassTemplateDecl 0x102824dc0 </tmp/test.cpp:1:1, line:13:1> Foo
| |-TemplateTypeParmDecl 0x102824c80 <line:1:10, col:19> typename TData
| |-CXXRecordDecl 0x102824d30 <line:2:1, line:13:1> class Foo definition
| | |-CXXRecordDecl 0x102870c70 <line:2:1, col:7> class Foo
| | |-AccessSpecDecl 0x102870d00 <line:4:1, col:7> public
| | |-TypedefDecl 0x102870d40 <line:5:3, col:15> Self 'Foo<TData>'
| | |-TypedefDecl 0x102870da0 <line:6:3, col:17> Data 'TData'
| | |-CXXConstructorDecl 0x102870e60 <line:7:3, col:7> Foo<TData> 'void (void)'
| | |-CXXMethodDecl 0x102871100 <line:9:3, col:36> getData 'Data (int, const char *)'
| | | |-ParmVarDecl 0x102870f50 <col:16, col:20> i 'int'
| | | `-ParmVarDecl 0x102870ff0 <col:23, col:35> s 'const char *'
| | |-AccessSpecDecl 0x1028711e0 <line:10:1, col:8> private
| | `-FieldDecl 0x102871220 <line:12:3, col:8> m_data 'Data':'TData'
| |   `-FullComment 0x102871520 <line:11:6, col:22>
| |     `-ParagraphComment 0x1028714f0 <col:6, col:22>
| |       `-TextComment 0x1028714c0 <col:6, col:22> Text=" my precious data"
| `-ClassTemplateSpecializationDecl 0x102871280 <line:1:1, line:13:1> class Foo
|   |-TemplateArgument type 'int'
`-TypedefDecl 0x102871430 <line:15:1, col:18> IntFoo 'Foo<int>':'class Foo<int>'

So we have the definition of the template class Foo, with almost everything in it — template type, typedef, constructor, methods, etc. Even the comment associated to the field m_data is there — this may also be a chance to avoid a complex tool chain to incorporate the documentation in python and java.

However, it should be noted that some things are missing, at least in this dump:

  • the comment // just a comment is gone,
  • there is no trace of several characters, like ;, {, or } – it may be there internally but not dumped though,
  • there is no detail about the type Foo<int> declared in the typedef at the last line.

The two first points are a bit problematic for vera++, but it should be possible to make a correspondance with the tokens, and fill the missing informations. At least it means vera++ can’t do with clang alone, and must keep its current tokenizer.

The last point is a problem for ITK wrappers, because this is exactly the information we need. It looks like a problem we had with gccxml though: unless it is used somewhere, the type is not resolved. In ITK, we force the type instantiation with a sizeof() of the type. So I add this at the end of my test.cpp:

void force_instantiate()
{
  sizeof(IntFoo);
}

and redump the AST:

$ clang -Xclang -ast-dump -fsyntax-only /tmp/test.cpp
/tmp/test.cpp:19:3: warning: expression result unused [-Wunused-value]
  sizeof(IntFoo);
  ^~~~~~~~~~~~~~
TranslationUnitDecl 0x1028242d0 <<invalid sloc>>
|-TypedefDecl 0x102824810 <<invalid sloc>> __int128_t '__int128'
|-TypedefDecl 0x102824870 <<invalid sloc>> __uint128_t 'unsigned __int128'
|-TypedefDecl 0x102824c30 <<invalid sloc>> __builtin_va_list '__va_list_tag [1]'
|-ClassTemplateDecl 0x102824dc0 </tmp/test.cpp:1:1, line:13:1> Foo
| |-TemplateTypeParmDecl 0x102824c80 <line:1:10, col:19> typename TData
| |-CXXRecordDecl 0x102824d30 <line:2:1, line:13:1> class Foo definition
| | |-CXXRecordDecl 0x102870c70 <line:2:1, col:7> class Foo
| | |-AccessSpecDecl 0x102870d00 <line:4:1, col:7> public
| | |-TypedefDecl 0x102870d40 <line:5:3, col:15> Self 'Foo<TData>'
| | |-TypedefDecl 0x102870da0 <line:6:3, col:17> Data 'TData'
| | |-CXXConstructorDecl 0x102870e60 <line:7:3, col:7> Foo<TData> 'void (void)'
| | |-CXXMethodDecl 0x102871100 <line:9:3, col:36> getData 'Data (int, const char *)'
| | | |-ParmVarDecl 0x102870f50 <col:16, col:20> i 'int'
| | | `-ParmVarDecl 0x102870ff0 <col:23, col:35> s 'const char *'
| | |-AccessSpecDecl 0x1028711e0 <line:10:1, col:8> private
| | `-FieldDecl 0x102871220 <line:12:3, col:8> m_data 'Data':'TData'
| |   `-FullComment 0x103000470 <line:11:6, col:22>
| |     `-ParagraphComment 0x103000440 <col:6, col:22>
| |       `-TextComment 0x103000410 <col:6, col:22> Text=" my precious data"
| `-ClassTemplateSpecializationDecl 0x102871280 <line:1:1, line:13:1> class Foo definition
|   |-TemplateArgument type 'int'
|   |-CXXRecordDecl 0x1028715d0 prev 0x102871280 <line:2:1, col:7> class Foo
|   |-AccessSpecDecl 0x102871660 <line:4:1, col:7> public
|   |-TypedefDecl 0x1028716a0 <line:5:3, col:15> Self 'class Foo<int>'
|   |-TypedefDecl 0x102871730 <line:6:3, col:17> Data 'int':'int'
|   |-CXXConstructorDecl 0x1028717c0 <line:7:3> Foo 'void (void)'
|   |-CXXMethodDecl 0x102871a20 <line:9:3, col:36> getData 'Data (int, const char *)'
|   | |-ParmVarDecl 0x1028718b0 <col:16, col:20> i 'int'
|   | `-ParmVarDecl 0x102871910 <col:23, col:35> s 'const char *'
|   |-AccessSpecDecl 0x102871ae0 <line:10:1, col:8> private
|   `-FieldDecl 0x102871b20 <line:12:3, col:8> m_data 'Data':'int'
|     `-FullComment 0x103000540 <line:11:6, col:22>
|       `-ParagraphComment 0x103000510 <col:6, col:22>
|         `-TextComment 0x1030004e0 <col:6, col:22> Text=" my precious data"
|-TypedefDecl 0x102871430 <line:15:1, col:18> IntFoo 'Foo<int>':'class Foo<int>'
`-FunctionDecl 0x1028714a0 <line:17:1, line:20:1> force_instantiate 'void (void)'
  `-CompoundStmt 0x102871b88 <line:18:1, line:20:1>
    `-UnaryExprOrTypeTraitExpr 0x102871b68 <line:19:3, col:16> 'unsigned long' sizeof 'IntFoo':'class Foo<int>'

So this time there are a few more things (in addition to the warning):

  • a FunctionDecl for the force_instantiate function,
  • a ClassTemplateSpecializationDecl corresponding to the IntFoo type, with every type resolved — great!

libclang

The dump seems to have everything needed, but is certainly not easy to parse. And clang is made to be used as a library, so there should be a good API to access everything needed in the AST. In fact there are two options: use the stable API called libclang, or use the internal clang AST, that may change in the future. Based only on that, I’m more tempted to try libclang.

In fact, there is even a python module to use libclang — this is nice, because the ITK code generator is already written in python. Using clang from python may avoid to recreate the generator from scratch.

To install the python module, I just run:

pip install clang

Using it in python requires a bit more work on Mac OS X. After the import clang.cindex, the path of libclang should be configured. Either:

clang.cindex.Config.set_library_path(
  "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib")

with XCode, or

clang.cindex.Config.set_library_path("/Library/Developer/CommandLineTools/usr/lib")

with the command line tools. Then create the index and read the file:

index = clang.cindex.Index.create()
tu = index.parse("/tmp/test.cpp")

Note that libclang is fully silent when the index is created this way. This is the expected behavior when using libclang from a script, because the diagnostics are accessible through the API. When using it from the python interpreter, it makes the process more complicated, and it is very easy to miss that an error occurred during the parsing of the file. It is possible to make libclang display the problems by instantiating the index this way:

index = clang.cindex.Index(clang.cindex.conf.lib.clang_createIndex(False, True))

And of course it would be even better to be able to pass this option to the normal clang.cindex.Index.create().

The API let us iterate over all the nodes very easily. Dumping some of the information in the AST can be done this way:

#!/usr/bin/env python

import clang.cindex, asciitree, sys

clang.cindex.Config.set_library_path("/Library/Developer/CommandLineTools/usr/lib")
index = clang.cindex.Index(clang.cindex.conf.lib.clang_createIndex(False, True))
translation_unit = index.parse(sys.argv[1], ['-x', 'c++'])

print asciitree.draw_tree(translation_unit.cursor,
  lambda n: n.get_children(),
  lambda n: "%s (%s)" % (n.spelling or n.displayname, str(n.kind).split(".")[1]))

Called on the test.cpp file, it produce:

$ python /tmp/dump.py /tmp/test.cpp 
/tmp/test.cpp (TRANSLATION_UNIT)
  +--Foo (CLASS_TEMPLATE)
  |  +--TData (TEMPLATE_TYPE_PARAMETER)
  |  +-- (CXX_ACCESS_SPEC_DECL)
  |  +--Self (TYPEDEF_DECL)
  |  |  +--Foo<TData> (TYPE_REF)
  |  +--Data (TYPEDEF_DECL)
  |  |  +--TData (TYPE_REF)
  |  +--Foo<TData> (CONSTRUCTOR)
  |  +--getData (CXX_METHOD)
  |  |  +--Data (TYPE_REF)
  |  |  +--i (PARM_DECL)
  |  |  +--s (PARM_DECL)
  |  +-- (CXX_ACCESS_SPEC_DECL)
  |  +--m_data (FIELD_DECL)
  |     +--Data (TYPE_REF)
  +--IntFoo (TYPEDEF_DECL)
  |  +--Foo (TEMPLATE_REF)
  +--force_instantiate (FUNCTION_DECL)
     +-- (COMPOUND_STMT)
        +-- (UNEXPOSED_EXPR)
           +--IntFoo (TYPE_REF)

Unfortunately, the template specialization is not there… For some reason, it is not exposed at all, even as an UNEXPOSED_EXPR.

Conclusion

So in the end, clang still seems to be a great tool, but its stable API, libclang, seems to be lacking in some of the features I need. This is OK for vera++, and may be investigated more closely in the future. For ITK wrapping, I guess I have a few options:

  • fix libclang
  • use the internal AST to create the code generator
  • parse the AST dump
  • fix gccxml

I think I’ll go for the first.

Converting nanoblogger posts to jekyll

So my old posts were done with nanoblogger, and I’m now using jekyll. Fortunately, both are using a quite similar format, so importing my old posts should be not too difficult.

Even better, there is already a tool for that. I had to modify it a little though, in order to avoid adding categories and other things that I don’t need in the front matter, use .md instead of .html as file extension and remove the extra line break at each line.

Here is my modified version:

# Script to convert a directory of Nanoblogger posts to Jekyll
#
# Nanoblogger is a command line, static blogging app, not that
# dissimilar to Jekyll: http://nanoblogger.sourceforge.net/
# 
# It's been years since I've used it though, but the below script
# worked for me in converting the files to Jekyll. 

Dir['*.txt'].each do |f|
	# Need to read file to find title
	title = ''
	lines = IO.readlines(f)
	lines.each do |l|
		if /TITLE/ =~ l 
			title = l[6..-1].strip # strip leading whitespace, trailing return
			break
		end
	end
	lines.slice!(0..lines.index("BODY:\n")) # Remove Nanoblogger front matter
	begin
		lines.slice!(lines.index("END-----\n")..-1) # Remove Nanoblogger end matter
	rescue
		lines.slice!(lines.index("END-----")..-1) # Might not have line break
	end
	# Add in Jekyll Yaml Front Matter
	# !Important note to self. I had already split the archive of posts into subfolders based
	# on category. So this script would be run on each subfolder and the category manually set 
	# below.
	lines.unshift("---\n", "title: #{title}\n", "---\n")
	# Replace data in file #IO.writelines?
	File.open(f,"w") do |file|
		lines.each do |l|
			file << l
		end
	end
	# Then rename file based on title
	title = title.gsub(/\s/,"-") # replace spaces, any other dodgy characters can manually fix
	title = "-"+title # Add leading -
	newf = f.gsub(/T.*/, title+".md")
	newf = newf.gsub(/\\|\/|:|\*|\?|\"|<|>|\|/, "_") # Windows safe filenames
	File.rename(f, newf)
end

Then I just had to copy all the .txt from nanoblogger’s data directory to jekyll’s _posts directory, run the scripts, and a add few tags where needed. Done!

Raspberry Pi As A Temperature Logger

I was looking for some time for a good reason to buy a Raspberry Pi. Unfortunately I already have a nice nettop box that is doing its job very well as a media player with XBMC (soon Kodi) plus a few services (irc bot, file sharing, …) Ok, it is not as silent and more power greedy than I’d wish, but I can’t just throw it away…

But I finally found the good reason I was looking for!

I want to log the temperature of a house several times a day during several weeks. The house has power but no internet access. Ideally, it would be able to log the temperature inside and outside the house. I’ve looked for a ready to use tool that could do the job at a low cost, but haven’t found anything really convincing.

This is the kind of task the Raspberry Pi can do very easily, with the help of a DS18B20 sensor.

Raspberry Pi A+

So I bought:

  • a Raspberry Pi A+ — I don’t need any USB port or ethernet connection, 256MB of RAM is more than enough, and the very low power consumption is always nice to have,
  • two DS18B20 sensors — one is waterproof with a cable and the other is not,
  • and a 4700Ω resistors.

I already had:

  • a spare micro SD card,
  • a (lot of) USB power supply,
  • a box,
  • four screws,
  • an old IDE cable to connect to the GPIO,
  • a switch from an old phone headset to use as shutdown button.

Total cost: less than 40€.

The system, a Raspbian Wheezy 2014-09-09, has been installed without any problem — I even been able to configure it with the bépo keyboard layout during the setup. Great.

Here is how it looks:

open open open

Not a peace of art, but it should do the job!

Temperature sensors

I’ve soldered the pull up resistor on the IDE cable between the pin 1 and 7, and the pin 1, 2 and 3 of the DS18B20 on the pin 9, 7 and 1 of the GPIO respectively.

The temperatures are read through the sysfs interface, that requires to load two extra modules in the kernel. To do that, I’ve added these two lines in the /etc/modules file:

w1-gpio
w1-therm

A python script run by cron every ten minutes reads the temperature from the sensors and writes the output to a file:

#!/usr/bin/env python

import glob, os
from datetime import datetime

out = open("temp.csv", "a")
date = str(datetime.now())
ds = glob.glob('/sys/bus/w1/devices/28-*/w1_slave')
for d in ds:
    name = d.split("/")[-2]
    temp = str(float(open(d).read().split()[-1][2:])/1000)
    print >> out, ";".join([date, name, temp])
out.close()

The date is the same for all the sensors, to make easier to compare the temperatures. When doing fast acquisition or using many sensors, it may be a better idea to get the real date for each sensor, as each sensor take a second or so to read.

The crontab entry is:

*/10 * * * * /home/pi/temp.py

Here are a few lines from the generated temp.csv file:

2014-12-21 16:20:01.575703;28-001450767aff;21.875
2014-12-21 16:30:02.006573;28-001450767aff;21.625
2014-12-21 16:40:02.116337;28-001450767aff;22.125

There is a single sensor for now – I’m still waiting for the waterproof sensor. And here is a chart with the data on a few hours:

Temperatures

The data is in CSV format, so it can be easily imported everywhere. I’ve used R:

data <- read.table("temp.csv", sep=";", col.names=c("date", "sensor", "temperature"))
data$date <- as.POSIXct(as.character(data$date))
plot(temperature ~ date, data)

Shutdown button

Without a screen, keyboard, or any kind of connection, it is not possible to shutdown the Raspberry Pi properly. Fortunately it has many input on the GPIO that can be used just for that. I’ve soldered a switch between the pin 39 (GPIO21) and the pin 40 (ground).

A python script launched by /etc/rc.local just waits for the input to change, and then initiates the clean system shutdown.

#!/usr/bin/env python

import RPi.GPIO as GPIO
import os

GPIO.setmode(GPIO.BCM)
GPIO.setup(21, GPIO.IN, pull_up_down=GPIO.PUD_UP)
GPIO.wait_for_edge(21, GPIO.BOTH)
os.system("logger 'power button pressed'")
os.system("halt")

Note that it wasn’t possible to use the pin number with the RPi.GPIO module installed by default on my raspbian. It failed with the error:

ValueError: The channel sent is invalid on a Raspberry Pi

I had to run setmode(BCM) instead of setmode(BOARD) to avoid that problem, and use the GPIO number instead of the pin number:

Raspberry Pi A+/B+ GPIO

The switch, once pushed, connects the GPIO21 to the ground, so the GPIO21 has to be configured with a pull up resistor. When reading the value with input(21), the value returned is 1 when the switch is not pressed and 0 when it is pressed. This may seem a bit counter intuitive, but only describe the current state on the GPIO21: 1 for the 3.3V from the pull up resistor, 0 for the 0V from the ground.

Instead of reading the value from time to time and triggering the shutdown when the value is 0, the script just use wait_for_edge(21, BOTH), that detects when the value change on the pin and exits when it happens.

A message is sent to syslog with the logger command to keep a track that the button was pressed.

The line in the file /etc/rc.local is just:

/home/pi/shutdown.py &

The & is mandatory to let the process run in the background.

Lack of clock battery and network

I already knew the Raspberry A+ has no network interface. This is fine for this application, because the house has no internet connection.

This is not convenient for the setup though, where everything must be done with the screen and the keyboard, or transfered to the SD card before booting the Raspberry Pi.

Another problem I overlooked is the lack of battery for the clock. This mean the Raspberry Pi can’t keep its clock running when it is not powered. The Raspbian deals with that by keeping the date unchanged since the last time it went down. With the network, the date and time are updated with NTP. Without network, the only way to update the date is to use a command like:

sudo date -s "december 21 2014 17:29"

Obviously, the only (easy) way to run that is with a screen and a keyboard, and I won’t have any when I’ll plug the Raspberry Pi in that house.

Again, a network interface would have helped, as I could have used my laptop to set the date through SSH.

I could buy a USB WIFI adapter, but the total cost would then be quite close to the price of the Raspberry B+, and WIFI is a lot more painful to configure than a simple ethernet connection.

I guess I’ll just have to set the approximate date at which it will boot before unplugging it.

A power failure will also shift the time recorded with the temperature. I’m not sure how important it can be, but to be sure to know when a power failure occur, I’ve added a command that renames the temperature file when the Raspberry reboots:

mv temp.csv temp.csv.`date +%Y-%m-%dT%H.%M.%S`

This is run by cron with the special string @reboot.

So a Raspberry A+ is fine for the application itself, but the network interface would have helped quite a lot to configure everything and experiment with the temperature logging. I think I’ll buy a B+ for my next project!

That’s it!

Now I wait for the weatherproof sensor to arrive. In the mean time, I let the logger accumulate some real temperatures that I can use to try to extract some interesting informations.

Back to blogging, with Jekyll

I’ve been quite busy in the last years — 2 children (conceived by IVF), a new job, two new homes — that didn’t let me much time for blogging. The lack of stable IP at my new homes for my self hosted blog finally completely took my blog off line.

I keep trying a few fun things though, that I can’t necessarily find in all these amazing blogs on the internet. I feel I should make these things available somewhere, instead of keeping them for me.

Things have changed quite a lot since my last attempt at blogging. More specifically, my blogging tool nanoblogger, is gone. But a lot of alternative are there! I’ve chosen Jekyll with Lanyon. The theme is nice and clean and it integrates very easily with GitHub Pages — goodbye hosting problems! It is also a lot faster then nanoblogger, and I feel right at home with the markdown syntax and the directory layout.

The few missing things were found on Joshua Lande’s blog or created from scratch.

What was done can be found in my GitHub repository. A few steps that may be useful for others:

Next step, get back my old posts from nanoblogger.

Goodbye, ogg

Finally, I give up. I’ve been using ogg to store most of my music for several years, but I’m tired of not being able to simply read my music somewhere else than on my computer. There are so much player everywhere! In the car, in my pocket, in my phone, … MP3 is too well established to do without it.

So, to make it simple, I convert all the ogg I have in mp3. I’ll reextract the mp3 properly later, when time permit. I found a nice and simple tool for this task: ogg2mp3. With a command like this one

find . -name *.ogg -print0 | time xargs -P8 -0 -l /tmp/ogg2mp3 -f --enc_opts --preset standard -S -- --dec_opts -Q --

a little more than 1000 ogg are converted in mp3 in 40 minutes on a 8 core nehalem 2.96 GHz. The hardest was to make the decision…