Basics of Protocol Buffers 3 – Part 2

This post explains basics of protocol buffers 3 part 2. It will specifically focus on how to do the followings.
1. multiple messages in a same file
2. nested messages
3. imports
4. packages

If you would like to read part 1, please read this post.

Multiple messages in a same file

You can have multiple messages in a same .proto file. Here is an example.
All you need to do is to define those messages and reference properly.

syntax = "proto3";

message Person {
    string first_name = 1;
    string last_name = 2;
    Address address = 3;
}

message Address {
    string street = 1;
    string city = 2;
    string zip_code = 3;
}

Nested Messages

You can nest messages depending on your design. You might want to nest messages to avoid naming conflicts (you can also use packages for this) or enforce some level of locality for that type. And you can nest types as deep as you want. Let’s take a look at an example code.

syntax = "proto3";

message Person {
    string first_name = 1;
    string last_name = 2;
    
    message Address {
        string street = 1;
        string city = 2;
        string zip_code = 3;
    }
    
    Address address = 3;
}

Import Messages

It is very reasonable to have messages in a different file and reuse it. This is especially the case that you need to use proto file created by another team. We have two files for the example. Please note that when you import you need to specify the folder too even if the files are in the same directory.

// this is person.proto located under "info" directory.
syntax = "proto3";

import "info/address.proto";

message Person {
    string first_name = 1;
    string last_name = 2;
    Address address = 3;
}
// this is address.proto located under "info" directory
syntax = "proto3";

message Address {
    string street = 1;
    string city = 2;
    string zip_code = 3;
}

Packages

It is important to define the packages where your protocol buffer messages live for the following reasons.
1. The file will be placed at the package you specified when you compile the code
2. It will prevent name conflicts between messages

Packages will assure all the different languages compile correctly from .proto files. When you import package proto, make sure you specify package in front of the imported messages. Let’s take a look at the example below.

// city.proto located under "info" directory
syntax = "proto3";

package city;

message City {
    string name = 1;
    string country_code = 2;
}
// address.proto located under "info" directory
syntax = "proto3";

package address;

import "info/city.proto";

message Address {
    string street = 1;
    city.City city = 2;
    string zip_code = 3;
}
// person.proto located under "info" directory
syntax = "proto3";

package person;

import "info/address.proto";

message Person {
    string first_name = 1;
    string last_name = 2;
    address.Address address = 3;
}

Conclusion

We have taken a look at simple but pretty useful features of protocol buffers. Next post will explain how to compile and use in python/go language.

Basics of Protocol Buffers 3 – Part 1

In this post, I am going to explain the basics of protocol buffer 3. Protocol buffer is developed by Google for better handling of data. There are many data formats such as csv, json. However, there are some weaknesses in each format. CSV is easy to handle but has some disadvantages – data type has to be inferred, hard to parse when the data includes commas. JSON is used in many places and can be communicated over the web and very flexible in format but it doesn’t have schema enforcing and JSON objects could be pretty big size because of repeated keys.

Advantages of Protocol Buffers

  • Data is fully typed
  • Data is compressed automatically results in less CPU usage
  • Schema is required to generate code and read the data
  • Documentations could be part of the schema
  • Supports multi-language communication – data can be shared in different languages (Java, Python, Go, Javascript and others)
  • Schema can evolve over time in a safe way
  • Code is auto generated for the convenience

Disadvantages

  • Not all languages are supported
  • Since data is serialized, you can’t open the data file with text editor

Example

This is an example of protocol buffer schema and we will take a look at each piece.

syntax = "proto3";

message Person {
  int32 age = 1;
  string first_name = 2;
  string last_name = 3;
  bytes profile_img = 4;
  bool verified = 5;
  float height = 6;
  repeated string contacts = 7;
}

Schema

You always have to put syntax = “proto3” to indicate this is protocol buffer 3. If you want to use 2 then replace 3 with 2.

Each schema starts with the keyword message then schema name with open/close braces.

In the schema, you can have multiple fields. Each field consists of field type, field name and tag. The first word is field types which are int32, string, bytes, bool, float as you see in the example. Next one is field name which you can arbitrarily decides. Mainly, it’s for your readability. The last one is tag which is more important than field names and is used for protocol buffers. Let’s take a look at each part.

Field Types

There are multiple built in types supported in protobuf3. I will not explain much about each type as they look very similar to other languages like C/C++, Java.

Integers
type: int32, int64, uint32, uint64, sint32, sint64
Floating Point Numbers
type: float (32 bits), double (64 bits)
Boolean
type: bool
String

String must always contain UTF-8 encoded or 7 bit ASCII text
type: string

Bytes

Raw byte array. Interpretation of bytes depends on the code.
type: bytes

Repeated Fields

Protocol buffers supports list or array by using “repeated” keyword. The specified field can take any number (0 or more) of elements you want. After the repeated keyword, you need to specify which type you want to use. Please refer to the example above.

Enums

If you need to use the values that are known in advance (i.e., day of week), you can use enum type.
Please note that the first value of an enum is the default value and enum must start by the tag 0 which is the default value. Here is an example of enums. You can use the enum type just like others after you define it.

enum DayOfWeek {
  UNDEFINED = 0;
  MONDAY = 1;
  TUESDAY = 2;
  WEDNESDAY = 3;
  THURSDAY = 4;
  FRIDAY = 5;
  SATURDAY = 6;
  SUNDAY = 7;
}

Tag

In protocol buffers, field names are not important because it’s not actually used for the actual communication. Instead, the tag is used and thus is a very important element. In the example above, there are always values after field names. Those values are tags. The smallest value you can use is 1 and the largest value you can use is 2^29 – 1 or 536870991.

Tags numbered from 1 to 15 use 1 byte in space. It is recommended to use them for frequently populated fields.
Tags numbered from 16 to 2047 use 2 bytes.

Please note that the numbers between 19000 – 19999 are reserved by google for special use.

Conclusion

We have taken a look at very basics of protocol buffers. Please continue to read this post for more about protocol buffers.