Difference between revisions of "User:Dekarl"

From XMLTV
Jump to: navigation, search
(Known Issues with Character Encoding: update with encoding fixes)
(Proper TV Metadata Schema Bits and Pieces (Hi TVBrainz): Sesame Street started with shared seasons to branch off into a localized german branch)
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Known Issues with Character Encoding =
 
= Known Issues with Character Encoding =
Which grabbers status will turn red if we add character encoding checks?
+
*Need to verify that we can dump perl strings at XMLTV::Writer and it will do the right thing with regard to escaping anything outside $encoding into XML entities.
*{{grabber|ch_search}} data source sends windows-1252 as iso-8859-1 (e.g. Euro Symbol) Newer HTML readers are supposed to handle this correctly. Need to verify that we can dump perl strings at XMLTV::Writer and it will do the right thing with regard to escaping anything outside $encoding into XML entities.
+
*{{grabber|hr}} {{grabber|no_gfeed}} {{grabber|se_swedb}} [http://repo.or.cz/w/nonametv.git/commitdiff/17bfeec55a6cc01adb9db4d8a78f0fb17cfde11d fix commited upstream]
*{{grabber|it}} stores category in utf-8
+
*{{grabber|se_swedb}} data source sends windows-1252 as iso-8859-1 (data source Viasat)
+
 
*{{ticket|1910245}} should add a test for HTML entities in the generated XML. (hint ´ is invalid XML!)
 
*{{ticket|1910245}} should add a test for HTML entities in the generated XML. (hint ´ is invalid XML!)
  
 
= Issues with Time Zones =
 
= Issues with Time Zones =
*{{ticket|2483562}} {{grabber|huro}} in netherlands
+
*{{grabber|dk_dr}} DST issues
 +
*{{grabber|il}} DST issues
 +
*{{grabber|it}} DST issues
 +
*{{grabber|pt_meo}} DST issues
 +
*{{grabber|uk_bleb}} DST issues, floating start>stop leads to wrong date calculation and time offsets
  
 
= Memory Leaks?? =
 
= Memory Leaks?? =
Line 30: Line 32:
  
 
= Potential Data Sources =
 
= Potential Data Sources =
 +
Candidates for wrapping into [[User:Dekarl/Static_File_Grabber_Template|Static File Grabbers]]
 +
* _cz_arcao: XMLTV export from [http://xmltv.arcao.com/ arcao.com]. Provides explicit time offsets.
 +
* _dk_ontv: XMLTV export from [http://ontv.dk/xmltv/ ontv.dk].
 +
* _eu_phazer: XMLTV service from tvprofil.net aka [http://tvprofil.net/xmltv/ Phazer XMLTV Service]. Notice that they provide timestamps in their local time as floating time which is intepreted as UTC...
 +
* <strike>_fr_kazer: XMLTV service from [http://kazer.org/ kazer.org].</strike> [http://xmltv.cvs.sourceforge.net/viewvc/xmltv/xmltv/grab/fr_kazer/ done]
 +
* _it_ambrosa: XMLTV export from [http://www.ambrosa.net/index.php/contents/XMLTV.html ambrosa.net]. Explicit about non-commercial use only.
 +
* _ru_teleguide: XMLTV export from [http://www.teleguide.info/article1.html teleguide.info]. Provides explicit time offsets.
 +
 +
= Configuration API =
 +
* [http://sourceforge.net/mailarchive/forum.php?thread_name=49B6B9F2.80905%40holmlund.se&forum_name=xmltv-devel] possible extensions/clarifications from a consumers POV
 +
* list in the supplementary files mapping DVB/ATSC id to grabber/channel. Then let --list-channels & co. enrich channel list with related ids. [http://code.mythtv.org/trac/wiki/TaskBrowserBasedSetup MythTV's Browser Based Setup] might be a consumer for this to allow automatic mapping of channels in the guide to channels on the video source.
 +
 +
= Data Sinks =
 +
* Check [http://www.cse.unsw.edu.au/~willu/w/xmltv/grabbers/index.html] and see if all are mentioned here
 +
 +
= Best Practices =
 +
== Consumers of XMLTV Data ==
 +
* be prepared that xmltv ids really might be similar to FQDN (255 characters max.) the longest I've seen in the wild is 69 characters (_es_laguiatv)
 +
 +
= Random Pieces of Information =
 +
== You might receive the same transport stream on multiple frequencies ==
 +
TS 101 211 - DVB Guidelines on implementation and usage of Service Information (SI)
 +
NOTE 1: The cell_id cannot be used to identify a service. The combination of service_id and original_network_id
 +
remains a unique identification of a service.
 +
 +
It is recommended to make all receivable multiplexes with the same transport_stream_id but with different
 +
cell_ids available to the user, and only when a service (not a transport stream) is available through multiple
 +
multiplexes to select a preferred multiplex based on e.g. reception quality.
 +
 +
Any reference resolution from a transport_stream_id or a service_id (e.g. from a linkage_descriptor
 +
transport_stream_id/service_id pair) to a multiplex/frequency requires consideration to handle the potential multiplicity
 +
 +
Note that in networks deploying the service_availability_descriptor, the unique identification of a transport stream by
 +
the tuple (transport_stream_id, original_network_id), can often be sensibly replaced by identification through the triplet
 +
(transport_stream_id, original_network_id, cell_id).
 +
 +
== Proper TV Metadata Schema Bits and Pieces (Hi TVBrainz) ==
 +
*Some series have multiple sets of episode titles per locale, usually its one set per broadcasting company
 +
** the Exes http://www.fernsehserien.de/the-exes/episodenguide/staffel-1/16232
 +
*Some series have a different title per season
 +
**Elephant Princess http://de.wikipedia.org/wiki/Elephant_Princess
 +
*Some series have alternate titles
 +
**The Killing http://de.wikipedia.org/wiki/Kommissarin_Lund_%E2%80%93_Das_Verbrechen this series also uses roman or arabic numbers in addition to the title as season specific series title
 +
*Usually the episode title is unique per series, but some series have multiple episodes with the same title
 +
**Lindenstraße http://de.wikipedia.org/wiki/Lindenstra%C3%9Fe/Episodenliste#Mehrfache_Folgentitel
 +
*some series started on radio and continued on tv
 +
** Die Hesselbachs http://de.wikipedia.org/wiki/Die_Hesselbachs
 +
*some series got rebranded
 +
** Pusteblume -> Löwenzahn http://en.wikipedia.org/wiki/L%C3%B6wenzahn
 +
*many series don't do seasons
 +
**Eisenbahnromantik http://www.swr.de/eisenbahn-romantik/ (lots of repeats on bags of tv stations and lots of full episodes on youtube, good for testing VOD integration due to no regional restrictions http://www.youtube.com/user/Eisenbahnromantik )
 +
*some episodes belong to multiple series
 +
**it is common for documentary brands to buy unrelated documentary movies and mini series that are run under a bigger brand
 +
** - example of the two episodes that belong to two series goes here -
 +
*some series have a different order per country/station
 +
**Wickie und die starken Männer http://forums.thetvdb.com/viewtopic.php?f=41&t=18059
 +
*some series contain the same episode multiple times / with multiple episode numbers
 +
**Eisenbahnromantik http://www.swr.de/eisenbahn-romantik/
 +
*some series / collection of movies are unclear if they should be a series or a collection of movies or both
 +
**Varg Veum, has a main cast and seasons, also used a translated and the original season title http://de.wikipedia.org/wiki/Der_Wolf_%28Fernsehreihe%29
 +
**Rosamunde Pilcher, shares no cast http://de.wikipedia.org/wiki/Rosamunde_Pilcher#Verfilmungen
 +
**Inga Lindström, shares no cast http://de.wikipedia.org/wiki/Inga_Lindstr%C3%B6m#Inga-Lindstr.C3.B6m-Reihe
 +
**Utta Danella, shares no cast http://de.wikipedia.org/wiki/Utta_Danella#Verfilmungen
 +
**Harry Potter, a feature film series http://en.wikipedia.org/wiki/Harry_Potter_%28film_series%29
 +
*some series are coproduced for multiple locales with most of an episode being shared internationally, but parts being replaced locally (completely different shots/actors)
 +
**Fraggle Rock, see http://muppet.wikia.com/wiki/Fraggle_Rock#Co-Productions
 +
*some series started out internationally as dubs and later branched of into unrelated shows of the same brand
 +
**Sesame Street, see http://muppet.wikia.com/wiki/Sesamstrasse
 +
*some episodic movies are basically short movies pasted together
 +
**http://en.wikipedia.org/wiki/Love_at_Twenty
 +
 +
The set of titles (series/season/episode) should be marked as being a true alternate or just a typo/search hint.
 +
Some titles are international (used for all languages without localization) titles - showing a poster of the international is better then defaulting to a specific locale.

Latest revision as of 06:13, 14 October 2014

Known Issues with Character Encoding

Issues with Time Zones

Memory Leaks??

Cleanup List

Feel free to take anything from the list

  • #1880681 the bug was solved (not a bug), the suggestion for Supplementary Files is turning one can of worms into another...
  • tv_grab_huro has no maintainer?
    • #2748362 site changes: holes in the collected programs
    • #2837668 site changes: unexpected hash references
    • #2858285 close it, was "no channels found", the grabber does not fail completely anymore (see status)
    • #2910015 close it, was "no programs on channel 1", test_grabbers tests with that channel succesful

Cleanup SourceForge Project

  • remove group/status/category examples from the tracker (might check for other unused stuff while there)

Check for Breakage caused by LWP::Simple

  • silent uncompression
  • silent code page conversion
  • silent proxy handling

Maybe it's best to move most uses over to our own Get_nice.

Potential Data Sources

Candidates for wrapping into Static File Grabbers

  • _cz_arcao: XMLTV export from arcao.com. Provides explicit time offsets.
  • _dk_ontv: XMLTV export from ontv.dk.
  • _eu_phazer: XMLTV service from tvprofil.net aka Phazer XMLTV Service. Notice that they provide timestamps in their local time as floating time which is intepreted as UTC...
  • _fr_kazer: XMLTV service from kazer.org. done
  • _it_ambrosa: XMLTV export from ambrosa.net. Explicit about non-commercial use only.
  • _ru_teleguide: XMLTV export from teleguide.info. Provides explicit time offsets.

Configuration API

  • [1] possible extensions/clarifications from a consumers POV
  • list in the supplementary files mapping DVB/ATSC id to grabber/channel. Then let --list-channels & co. enrich channel list with related ids. MythTV's Browser Based Setup might be a consumer for this to allow automatic mapping of channels in the guide to channels on the video source.

Data Sinks

  • Check [2] and see if all are mentioned here

Best Practices

Consumers of XMLTV Data

  • be prepared that xmltv ids really might be similar to FQDN (255 characters max.) the longest I've seen in the wild is 69 characters (_es_laguiatv)

Random Pieces of Information

You might receive the same transport stream on multiple frequencies

TS 101 211 - DVB Guidelines on implementation and usage of Service Information (SI)

NOTE 1: The cell_id cannot be used to identify a service. The combination of service_id and original_network_id 
remains a unique identification of a service.
It is recommended to make all receivable multiplexes with the same transport_stream_id but with different
cell_ids available to the user, and only when a service (not a transport stream) is available through multiple
multiplexes to select a preferred multiplex based on e.g. reception quality.
Any reference resolution from a transport_stream_id or a service_id (e.g. from a linkage_descriptor
transport_stream_id/service_id pair) to a multiplex/frequency requires consideration to handle the potential multiplicity
Note that in networks deploying the service_availability_descriptor, the unique identification of a transport stream by
the tuple (transport_stream_id, original_network_id), can often be sensibly replaced by identification through the triplet
(transport_stream_id, original_network_id, cell_id).

Proper TV Metadata Schema Bits and Pieces (Hi TVBrainz)

The set of titles (series/season/episode) should be marked as being a true alternate or just a typo/search hint. Some titles are international (used for all languages without localization) titles - showing a poster of the international is better then defaulting to a specific locale.