Thursday 3 February 2022

Reusing Community Documentation - Part 1

For any writer working on a product based on an open source project, the question arises as to whether you should attempt to reuse content created by the open source community. Although reusing community content poses certain challenges, it can also bring great strategic benefits. Not only does this approach provide you with a head start when creating the first iteration of your documentation, it also makes the content more maintainable, as you can easily merge in updates made by the developer community.

Content reuse is not all plain sailing. As a technical writer working on product documentation, you would be lucky if you were in a position to decide exactly how to structure the open source / community version of the documentation you are reusing. In most cases, there is divergence—and a certain amount of tension—between the needs of the open source community and the needs of the commercial product. Developing strategies to deal with this divergence is the key to successful reuse of open source community documentation.

Let's look at sources of potential friction between open source content and product content.

Documentation Format or Markup Language

The choice of documentation format for the open source content is one of the most important factors affecting reuse. On the one hand, the format should be easy to learn and acceptable to community contributors—but on the other hand, the format should (ideally) also support sophisticated layout options and it should be possible to integrate it into your product documentation builds.

Some of the currently popular (and not so popular) formats for open source documentation include:

  • AsciiDoc
  • Markdown
  • Wiki markup
  • DocBook
  • DITA
  • LaTeX

In most cases, if you are serious about content reuse, it is worth switching your product documentation to match the format of the corresponding community documentation. In my experience, tools for converting between documentation formats are not accurate enough or reliable enough so that you could automatically switch back and forth between formats. Such tools are useful for a once-off conversion (followed by manual cleanup), but are unwieldy for tracking ongoing changes in the open source content.

Documentation Build Systems

Documentation build systems can also have an impact on content reusability. Although this impact is usually not too severe, build systems can mandate a specific directory structure, and can affect how you use variables and conditional text. This needs to be managed appropriately.

Here are some examples of popular documentation build systems used by open source communities:

  • Antora—for AsciiDoc
  • AsciiDoctor—for AsciiDoc
  • GitBook—for Markdown or AsciiDoc
  • DocBook XSL—for DocBook

Divergence of Content between Open Source and Product

The main challenge you typically face when reusing open source documentation is managing the divergence between the open source content and the product content. There are a variety of ways in which content can diverge and some kinds of content in the community documentation might need to be modified, as follows:

  • Project names—typically need to be replaced by branded product names.
  • Features unsupported in the product—need to be excluded from the product documentation.
  • Words like "supports" and "supported"—have a very different meaning, depending on whether they occur in the context of open source / community documentation or in product documentation. In community documentation, "we support feature X" simply means that "feature X" is available in the code (and you may use it at your own risk). But in a product, "we support feature X" means that your company is legally obliged to provide support for feature X.
  • Links to unwanted content—community contributors have a different yardstick for deciding what is appropriate content to link to, as compared to (corporate) product documentation. For example, community content may link to blog posts that quickly go out of date, or even link to blog posts written by a commercial rival of your company.
  • Links to code examples—community content often features links to (or direct inclusions of) example code. In the context of a product, such examples are often repackaged and re-released as product examples. In this case, you would want to modify links in the documentation to point at the productized example code instead of the community examples.
  • Inappropriate content—community content occasionally contains jokes or political comment that is inappropriate for product documentation.

Also, some additions and alterations to the content might need to be made in order fit within the context of the product documentation:

  • Modifications to the structure of a section. For example, section headings and anchor IDs might need to follow different conventions in the product documentation. There might also be a need for some boilerplate content around the beginning and end of each section.
  • Modifications to images.
  • Modifications to code samples.
  • Additional content that appears exclusively in the product documentation—for example, features available in the product, but not in the community edition.

Managing the Divergences

Considering the divergences discussed above, there are various strategies and techniques we can use to manage them:

  • Variables—provide an effective way of switching project names for product names, when reusing community content.
  • Conditional text—is an effective way to manage content that must appear only in the community documentation or only in the product documentation.
  • File organization—by separating out content you do not want to reuse into separate files, it becomes relatively easy to omit this content.
  • Scripting—you can process content taken from the community documentation with a script (for example, automatically searching and replacing content) as necessary.
  • Annotations + Scripting—you could add annotations (for example, in the form of comments) to the community content, which are then processed by scripts in order to reuse the content in the product.

Acceptability of Strategies for Managing Divergences

You might well find that some strategies suggested above for managing divergences are not usable in practice, because they are not acceptable to the community. Open source communities typically like to keep content neutral and do not like the idea of product-specific concerns encroaching on their content. Community attitudes can vary quite a lot, depending on how heterogeneous the make-up of the community. We can broadly categorize communities as follows:

  • Monocultural—a community dominated by one company. In this case, when looking to refactor community content, you are effectively negotiating with your colleagues. This scenario is usually straightforward and gives you maximum flexibility for coordinating the open source documentation with the product documentation.
  • Moderately diverse—a community consisting of a small number of collaborating companies. In this case, there is likely to be a close level of collaboration amongst contributors and a good degree of openness to refactoring the community content.
  • Highly diverse—a community consisting of a large number of companies, including your commercial rivals. In this case, the community typically puts a premium on keeping the content neutral. Any hint of skewing the content to suit one particular product would trigger resistance and is likely to be rejected. This scenario leaves you with limited options for managing divergences between the open source content and the product content.

In the light of these community types, we need to reconsider the various strategies for managing divergences:

  • Variables—fairly uncontroversial and typically acceptable for all community types.
  • Conditional text—a borderline case. You might be able to gain acceptance for this approach—provided you employ product-neutral labels for the conditions—but some communities might reject the use of conditional text altogether.
  • File organization—probably uncontroversial, as long as you are proposing a clean, well-organized file structure. But it would probably involve some negotiation.
  • Scripting—uncontroversial, because it does not involve the community. All of your scripted modifications can be applied on the product side of the content.
  • Annotations + Scripting—this combination is more controversial, because it requires you to add annotations directly to the open source content. This might be acceptable if the annotations are couched in a product-neutral way. But there is a fairly high probability it could be rejected by the community.

Conclusion

In this blog post, we've considered the factors that affect reuse of open source content. There are a variety of techniques available that can enable you to effectively manage the divergence between open source content and product content, but these techniques are not always acceptable to the coummunity you are collaborating with.

In our next blog post, we will take a look at a content management technique that could be used to manage divergences, even when you are blocked from making changes to the community content.

Friday 28 January 2022

Blog Revival in 2022

After a long stint working at Red Hat, I have decided to strike out on my own as a self-employed technical writing contractor (since January 2022). This has incidentally given me a window of opportunity to get back to my (rather long-neglected) blog and post some new entries on a variety of things I have learnt about technical writing in recent years.

The first theme I would like to tackle is the subject of reusing documentation from open source / community projects. This is an area I have been heavily involved in during recent years at Red Hat and I plan to make a series of posts on this subject over the next few weeks—with a new post planned for each Thursday of the week. Stay tuned!